Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

NTT DATA Indicators Strategic Collaboration Settlement with AWS to Ship AI-Powered Contact Middle Options

October 7, 2025

Gearing up for the World Robotic Olympiad

October 7, 2025

Strapping 5 heatsinks to your cellphone will enhance efficiency

October 7, 2025
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • NTT DATA Indicators Strategic Collaboration Settlement with AWS to Ship AI-Powered Contact Middle Options
  • Gearing up for the World Robotic Olympiad
  • Strapping 5 heatsinks to your cellphone will enhance efficiency
  • Metropolis of London Company introduces GenAI framework
  • Omnichannel residence & furnishing retailer Wakefit will get SEBI nod for IPO
  • Digital ID can remove scandal of Africa’s ‘invisibles’
  • The HR balancing act because it shifts in direction of empowering others
  • Ladies in robotics it is advisable to find out about 2025
Tuesday, October 7
NextTech NewsNextTech News
Home - AI & Machine Learning - A New Company-Centered Supervision Strategy Scales Software program AI Brokers With Solely 78 Examples
AI & Machine Learning

A New Company-Centered Supervision Strategy Scales Software program AI Brokers With Solely 78 Examples

NextTechBy NextTechOctober 6, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
A New Company-Centered Supervision Strategy Scales Software program AI Brokers With Solely 78 Examples
Share
Facebook Twitter LinkedIn Pinterest Email






Do curated, tool-grounded demonstrations construct stronger software program brokers than broad piles of generic instruction information? A crew of researchers from Shanghai Jiao Tong College and SII Generative AI Analysis Lab (GAIR) proposes LIMI (“Much less Is Extra for Company”), a supervised fine-tuning methodology that turns a base mannequin right into a succesful software program/analysis agent utilizing 78 samples. LIMI scores 73.5% common on AgencyBench (FTFC 71.7, RC@3 74.2, SR@3 74.6), beating robust baselines (GLM-4.5 45.1, Qwen3-235B-A22B 27.5, Kimi-K2 24.1, DeepSeek-V3.1 11.9) and even surpassing variants skilled on 10,000 samples—with 128× much less information.

Screenshot 2025 10 06 at 11.11.17 AM
https://arxiv.org/pdf/2509.17567

What precisely is new?

  • Company Effectivity Precept: LIMI state that agentic competence scales extra with information high quality/construction than uncooked pattern depend. The analysis crew fine-tune GLM-4.5/GLM-4.5-Air on 78 long-horizon, tool-use trajectories (samples) and report giant beneficial properties on AgencyBench and generalization suites (TAU2-bench, EvalPlus-HE/MBPP, DS-1000, SciCode).
  • Minimal however dense supervision. Every trajectory (~13k–152k tokens; ~42.4k avg.) captures full multi-turn workflows—mannequin reasoning, software calls, and surroundings observations—collected within the SII-CLI execution surroundings. Duties span “vibe coding” (interactive software program improvement) and analysis workflows (search, evaluation, experiment design).
Screenshot 2025 10 06 at 11.11.42 AM 1Screenshot 2025 10 06 at 11.11.42 AM 1
https://arxiv.org/pdf/2509.17567

How does it work?

  • Base fashions: GLM-4.5 (355B) and GLM-4.5-Air (106B). Coaching makes use of the slime SFT framework with similar configs throughout comparisons (to isolate information results).
  • Knowledge building: 60 actual queries from practitioners + 18 synthesized from high-star GitHub PRs (tight QA by PhD annotators). For every question, LIMI logs the total agent trajectory to profitable completion inside SII-CLI.
  • Analysis: AgencyBench (R=3 rounds) with FTFC, SR@3, RC@3; plus generalization suites (TAU2-airline/retail Cross^4, EvalPlus HE/MBPP, DS-1000, SciCode).
Screenshot 2025 10 06 at 11.12.02 AMScreenshot 2025 10 06 at 11.12.02 AM
https://arxiv.org/pdf/2509.17567

Outcomes

  • AgencyBench (avg): 73.5%. LIMI vs. GLM-4.5 (+28.4 pts); FTFC 71.7% vs 37.8%; SR@3 74.6% vs 47.4%.
  • Knowledge effectivity: LIMI (78 samples) outperforms GLM-4.5 skilled on AFM-CodeAgent SFT (10,000 samples): 73.5% vs 47.8%—+53.7% absolute with 128× much less information. Comparable gaps maintain vs AFM-WebAgent (7,610) and CC-Bench-Traj (260).
  • Generalization: Throughout tool-use/coding/scientific computing, LIMI averages ~57%, exceeding GLM-4.5 and different baselines; with out software entry, LIMI nonetheless leads barely (50.0% vs 48.7% for GLM-4.5), indicating intrinsic beneficial properties past surroundings tooling.
Screenshot 2025 10 06 at 11.13.13 AM 1Screenshot 2025 10 06 at 11.13.13 AM 1
https://arxiv.org/pdf/2509.17567

Key Takeaways

  1. Knowledge effectivity dominates scale. LIMI reaches 73.5% common on AgencyBench utilizing curated trajectories, surpassing GLM-4.5 (45.1%) and displaying a +53.7-point benefit over a 10k-sample SFT baseline—with 128× fewer samples.
  2. Trajectory high quality, not bulk. Coaching information are long-horizon, tool-grounded workflows in collaborative software program improvement and scientific analysis, collected by way of the SII-CLI execution stack referenced by the paper.
  3. Throughout-metric beneficial properties. On AgencyBench, LIMI reviews FTFC 71.7%, SR@3 74.6%, and robust RC@3, with detailed tables displaying giant margins over baselines; generalization suites (TAU2, EvalPlus-HE/MBPP, DS-1000, SciCode) common 57.2%.
  4. Works throughout scales. Superb-tuning GLM-4.5 (355B) and GLM-4.5-Air (106B) each yields giant deltas over their bases, indicating methodology robustness to mannequin measurement.

The analysis crew trains GLM-4.5 variants with 78 curated, long-horizon, tool-grounded trajectories captured in a CLI surroundings spanning software-engineering and analysis duties. It reviews 73.5% common on AgencyBench with FTFC, RC@3, and SR@3 metrics; baseline GLM-4.5 is reported at 45.1%. A comparability towards a ten,000-sample AFM-CodeAgent SFT baseline reveals 73.5% vs 47.8%; tool-free analysis signifies intrinsic beneficial properties (≈50.0% for LIMI vs 48.7% GLM-4.5). Trajectories are multi-turn and token-dense, emphasizing planning, software orchestration, and verification.


Take a look at the Paper, GitHub Web page and Mannequin Card on HF. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t overlook to affix our 100k+ ML SubReddit and Subscribe to our Publication.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

🙌 Comply with MARKTECHPOST: Add us as a most well-liked supply on Google.






Earlier articleStreamTensor: A PyTorch-to-Accelerator Compiler that Streams LLM Intermediates Throughout FPGA Dataflows


Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s traits at present: learn extra, subscribe to our publication, and change into a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

StreamTensor: A PyTorch-to-Accelerator Compiler that Streams LLM Intermediates Throughout FPGA Dataflows

October 6, 2025

HIPAA & GDPR-Prepared Healthcare Information Annotation Companion

October 6, 2025

Agentic Design Methodology: The way to Construct Dependable and Human-Like AI Brokers utilizing Parlant

October 6, 2025
Add A Comment
Leave A Reply Cancel Reply

Economy News

NTT DATA Indicators Strategic Collaboration Settlement with AWS to Ship AI-Powered Contact Middle Options

By NextTechOctober 7, 2025

Mumbai, October 6, 2025 – NTT DATA, a world chief in digital enterprise and know-how companies,…

Gearing up for the World Robotic Olympiad

October 7, 2025

Strapping 5 heatsinks to your cellphone will enhance efficiency

October 7, 2025
Top Trending

NTT DATA Indicators Strategic Collaboration Settlement with AWS to Ship AI-Powered Contact Middle Options

By NextTechOctober 7, 2025

Mumbai, October 6, 2025 – NTT DATA, a world chief in digital enterprise…

Gearing up for the World Robotic Olympiad

By NextTechOctober 7, 2025

A number of organisations, together with Google and the ECA are working…

Strapping 5 heatsinks to your cellphone will enhance efficiency

By NextTechOctober 7, 2025

A Reddit consumer posted final week, showcasing that after strapping a bunch…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!