Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

HSBC and Anchor FinTech Safe Hong Kong’s First Stablecoin Licenses as Regulators Guess on Digital Ties

April 11, 2026

Korea’s AI Healthcare Is Advancing, however Hospitals Wrestle to Use It at Scale – KoreaTechDesk

April 11, 2026

How Data Distillation Compresses Ensemble Intelligence right into a Single Deployable AI Mannequin

April 11, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • HSBC and Anchor FinTech Safe Hong Kong’s First Stablecoin Licenses as Regulators Guess on Digital Ties
  • Korea’s AI Healthcare Is Advancing, however Hospitals Wrestle to Use It at Scale – KoreaTechDesk
  • How Data Distillation Compresses Ensemble Intelligence right into a Single Deployable AI Mannequin
  • Air Powers a Clock That Remembers Its Digits
  • AI & Past Launches ‘AI & Past Accomplice Circle’ to Scale AI Adoption Throughout Enterprises
  • Syncere’s Lume Robotic Flooring Lamp Can Truly Fold Laundry, Make Your Mattress
  • Smartphone market grows barely however worth hikes anticipated this yr: Omdia
  • REVIEW: soundcore Work AI Voice Recorder – Tiny, magnetic, and surprisingly sensible
Saturday, April 11
NextTech NewsNextTech News
Home - AI & Machine Learning - Stanford Researchers Launched AgentFlow: In-the-Circulation Reinforcement Studying RL for Modular, Software-Utilizing AI Brokers
AI & Machine Learning

Stanford Researchers Launched AgentFlow: In-the-Circulation Reinforcement Studying RL for Modular, Software-Utilizing AI Brokers

NextTechBy NextTechOctober 9, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Stanford Researchers Launched AgentFlow: In-the-Circulation Reinforcement Studying RL for Modular, Software-Utilizing AI Brokers
Share
Facebook Twitter LinkedIn Pinterest Email


TL;DR: AgentFlow is a trainable agent framework with 4 modules—Planner, Executor, Verifier, Generator—coordinated by an specific reminiscence and toolset. The planner is optimized within the loop with a brand new on-policy methodology, Circulation-GRPO, which broadcasts a trajectory-level final result reward to each flip and applies token-level PPO-style updates with KL regularization and group-normalized benefits. On ten benchmarks, a 7B spine tuned with Circulation-GRPO reviews +14.9% (search), +14.0% (agentic), +14.5% (math), and +4.1% (science) over sturdy baselines.

What’s AgentFlow?

AgentFlow formalizes multi-turn, tool-integrated reasoning as an Markov Determination Course of (MDP). At every flip, the Planner proposes a sub-goal and selects a software plus context; the Executor calls the software; the Verifier alerts whether or not to proceed; the Generator emits the ultimate reply on termination. A structured, evolving reminiscence information states, software calls, and verification alerts, constraining context development and making trajectories auditable. Solely the planner is educated; different modules may be fastened engines.

The general public implementation showcases a modular toolkit (e.g., base_generator, python_coder, google_search, wikipedia_search, web_search) and ships quick-start scripts for inference, coaching, and benchmarking. The repository is MIT-licensed.

Screenshot 2025 10 08 at 7.18.44 PM 1
https://arxiv.org/pdf/2510.05592

Coaching methodology: Circulation-GRPO

Circulation-GRPO (Circulation-based Group Refined Coverage Optimization) converts long-horizon, sparse-reward optimization into tractable single-turn updates:

  • Closing-outcome reward broadcast: a single, verifiable trajectory-level sign (LLM-as-judge correctness) is assigned to each flip, aligning native planning steps with world success.
  • Token-level clipped goal: importance-weighted ratios are computed per token, with PPO-style clipping and a KL penalty to a reference coverage to stop drift.
  • Group-normalized benefits: variance discount throughout teams of on-policy rollouts stabilizes updates.
Screenshot 2025 10 08 at 7.19.03 PM 1Screenshot 2025 10 08 at 7.19.03 PM 1
https://arxiv.org/pdf/2510.05592

Understanding the outcomes and benchmarks

Benchmarks. The analysis staff evaluates 4 activity sorts: knowledge-intensive search (Bamboogle, 2Wiki, HotpotQA, Musique), agentic reasoning (GAIA textual cut up), math (AIME-24, AMC-23, Recreation of 24), and science (GPQA, MedQA). GAIA is a tooling-oriented benchmark for basic assistants; the textual cut up excludes multimodal necessities.

Fundamental numbers (7B spine after Circulation-GRPO). Common positive factors over sturdy baselines: +14.9% (search), +14.0% (agentic), +14.5% (math), +4.1% (science). The analysis staff state their 7B system surpasses GPT-4o on the reported suite. The mission web page additionally reviews coaching results corresponding to improved planning high quality, decreased tool-calling errors (as much as 28.4% on GAIA), and constructive tendencies with bigger flip budgets and mannequin scale.

Ablations. On-line Circulation-GRPO improves efficiency by +17.2% vs. a frozen-planner baseline, whereas offline supervised fine-tuning of the planner degrades efficiency by −19.0% on their composite metric.

Screenshot 2025 10 08 at 7.19.45 PM 1Screenshot 2025 10 08 at 7.19.45 PM 1
https://arxiv.org/pdf/2510.05592

Key Takeaways

  • Modular agent, planner-only coaching. AgentFlow constructions an agent into Planner–Executor–Verifier–Generator with an specific reminiscence; solely the Planner is educated in-loop.
  • Circulation-GRPO converts long-horizon RL to single-turn updates. A trajectory-level final result reward is broadcast to each flip; updates use token-level PPO-style clipping with KL regularization and group-normalized benefits.
  • The analysis team-reported positive factors on 10 benchmarks. With a 7B spine, AgentFlow reviews common enhancements of +14.9% (search), +14.0% (agentic/GAIA textual), +14.5% (math), +4.1% (science) over sturdy baselines, and states surpassing GPT-4o on the identical suite.
  • Software-use reliability improves. The analysis staff report decreased tool-calling errors (e.g., on GAIA) and higher planning high quality underneath bigger flip budgets and mannequin scale.

AgentFlow formalizes tool-using brokers into 4 modules (planner, executor, verifier, generator) and trains solely the planner in-loop through Circulation-GRPO, which broadcasts a single trajectory-level reward to each flip with token-level PPO-style updates and KL management. Reported outcomes on ten benchmarks present common positive factors of +14.9% (search), +14.0% (agentic/GAIA textual cut up), +14.5% (math), and +4.1% (science); the analysis staff moreover state the 7B system surpasses GPT-4o on this suite. Implementation, instruments, and quick-start scripts are MIT-licensed within the GitHub repo.


Take a look at the Technical Paper, GitHub Web page and Mission Web page. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you may be part of us on telegram as properly.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

🙌 Comply with MARKTECHPOST: Add us as a most popular supply on Google.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s tendencies at this time: learn extra, subscribe to our publication, and develop into a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

How Data Distillation Compresses Ensemble Intelligence right into a Single Deployable AI Mannequin

April 11, 2026

Alibaba’s Tongyi Lab Releases VimRAG: a Multimodal RAG Framework that Makes use of a Reminiscence Graph to Navigate Large Visible Contexts

April 11, 2026

A Coding Information to Markerless 3D Human Kinematics with Pose2Sim, RTMPose, and OpenSim

April 10, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

HSBC and Anchor FinTech Safe Hong Kong’s First Stablecoin Licenses as Regulators Guess on Digital Ties

By NextTechApril 11, 2026

In a transfer that proves even probably the most stoic of world banking giants can…

Korea’s AI Healthcare Is Advancing, however Hospitals Wrestle to Use It at Scale – KoreaTechDesk

April 11, 2026

How Data Distillation Compresses Ensemble Intelligence right into a Single Deployable AI Mannequin

April 11, 2026
Top Trending

HSBC and Anchor FinTech Safe Hong Kong’s First Stablecoin Licenses as Regulators Guess on Digital Ties

By NextTechApril 11, 2026

In a transfer that proves even probably the most stoic of world…

Korea’s AI Healthcare Is Advancing, however Hospitals Wrestle to Use It at Scale – KoreaTechDesk

By NextTechApril 11, 2026

South Korea has constructed seen momentum in AI healthcare, with rising regulatory…

How Data Distillation Compresses Ensemble Intelligence right into a Single Deployable AI Mannequin

By NextTechApril 11, 2026

Complicated prediction issues typically result in ensembles as a result of combining…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!