Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

60pc of corporations might lay off workers that received’t undertake AI

April 7, 2026

ModelBest Raises Funding, Enters USD 1 Billion+ Basis Mannequin Unicorn Tier

April 7, 2026

Irish co-founded Prism Layer out from stealth with $1m elevate

April 7, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • 60pc of corporations might lay off workers that received’t undertake AI
  • ModelBest Raises Funding, Enters USD 1 Billion+ Basis Mannequin Unicorn Tier
  • Irish co-founded Prism Layer out from stealth with $1m elevate
  • From kabadiwalas to AI: How India’s scrap steel commerce is getting a Zepto-like makeover
  • Safaricom begins migrating M-PESA customers to My OneApp platform
  • Galway-based AI start-up Octostar raises €6.1m
  • Can Busan Construct International Startups, Not Simply Fund Them? HiveMind Presents an Early Take a look at – KoreaTechDesk
  • Malaysia’s First World Resort is World’s Largestl With 7,351 Rooms and No Indicators of Slowing Down
Tuesday, April 7
NextTech NewsNextTech News
Home - AI & Machine Learning - Meet ‘AutoAgent’: The Open-Supply Library That Lets an AI Engineer and Optimize Its Personal Agent Harness In a single day
AI & Machine Learning

Meet ‘AutoAgent’: The Open-Supply Library That Lets an AI Engineer and Optimize Its Personal Agent Harness In a single day

NextTechBy NextTechApril 5, 2026No Comments6 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Meet ‘AutoAgent’: The Open-Supply Library That Lets an AI Engineer and Optimize Its Personal Agent Harness In a single day
Share
Facebook Twitter LinkedIn Pinterest Email


There’s a selected sort of tedium that each AI engineer is aware of intimately: the prompt-tuning loop. You write a system immediate, run your agent in opposition to a benchmark, learn the failure traces, tweak the immediate, add a software, rerun. Repeat this just a few dozen occasions and also you would possibly transfer the needle. It’s grunt work dressed up in Python recordsdata. Now, a brand new open-source library referred to as AutoAgent, constructed by Kevin Gu at thirdlayer.inc, proposes an unsettling different — don’t try this work your self. Let an AI do it.

AutoAgent is an open supply library for autonomously bettering an agent on any area. In a 24-hour run, it hit #1 on SpreadsheetBench with a rating of 96.5%, and achieved the #1 GPT-5 rating on TerminalBench with 55.1%.

image 6
https://x.com/kevingu/standing/2039843234760073341

What Is AutoAgent, Actually?

AutoAgent is described as being ‘like autoresearch however for agent engineering.’ The concept: give an AI agent a process, let it construct and iterate on an agent harness autonomously in a single day. It modifies the system immediate, instruments, agent configuration, and orchestration, runs the benchmark, checks the rating, retains or discards the change, and repeats.

To grasp the analogy: Andrej Karpathy’s autoresearch does the identical factor for ML coaching — it loops by means of propose-train-evaluate cycles, retaining solely modifications that enhance validation loss. AutoAgent ports that very same ratchet loop from ML coaching into agent engineering. As an alternative of optimizing a mannequin’s weights or coaching hyperparameters, it optimizes the harness — the system immediate, software definitions, routing logic, and orchestration technique that decide how an agent behaves on a process.

A harness, on this context, is the scaffolding round an LLM: what system immediate it receives, what instruments it will probably name, the way it routes between sub-agents, and the way duties are formatted as inputs. Most agent engineers hand-craft this scaffolding. AutoAgent automates the iteration on that scaffolding itself.

The Structure: Two Brokers, One File, One Directive

The GitHub repo has a intentionally easy construction. agent.py is all the harness underneath take a look at in a single file — it accommodates config, software definitions, agent registry, routing/orchestration, and the Harbor adapter boundary. The adapter part is explicitly marked as mounted; the remaining is the first edit floor for the meta-agent. program.md accommodates directions for the meta-agent plus the directive (what sort of agent to construct), and that is the one file the human edits.

Consider it as a separation of considerations between human and machine. The human units the route inside program.md. The meta-agent (a separate, higher-level AI) then reads that directive, inspects agent.py, runs the benchmark, diagnoses what failed, rewrites the related elements of agent.py, and repeats. The human by no means touches agent.py straight.

A vital piece of infrastructure that retains the loop coherent throughout iterations is outcomes.tsv — an experiment log mechanically created and maintained by the meta-agent. It tracks each experiment run, giving the meta-agent a historical past to be taught from and calibrate what to strive subsequent. The complete undertaking construction additionally consists of Dockerfile.base, an non-obligatory .agent/ listing for reusable agent workspace artifacts like prompts and abilities, a duties/ folder for benchmark payloads (added per benchmark department), and a jobs/ listing for Harbor job outputs.

The metric is whole rating produced by the benchmark’s process take a look at suites. The meta-agent hill-climbs on this rating. Each experiment produces a numeric rating: maintain if higher, discard if not — the identical loop as autoresearch.

The Process Format and Harbor Integration

Benchmarks are expressed as duties in Harbor format. Every process lives underneath duties/my-task/ and features a process.toml for config like timeouts and metadata, an instruction.md which is the immediate despatched to the agent, a assessments/ listing with a take a look at.sh entry level that writes a rating to /logs/reward.txt, and a take a look at.py for verification utilizing both deterministic checks or LLM-as-judge. An surroundings/Dockerfile defines the duty container, and a recordsdata/ listing holds reference recordsdata mounted into the container. Exams write a rating between 0.0 and 1.0 to the verifier logs. The meta-agent hill-climbs on this.

The LLM-as-judge sample right here is value flagging: as a substitute of solely checking solutions deterministically (like unit assessments), the take a look at suite can use one other LLM to judge whether or not the agent’s output is ‘right sufficient.’ That is widespread in agentic benchmarks the place right solutions aren’t reducible to string matching.

Key Takeaways

  • Autonomous harness engineering works — AutoAgent proves {that a} meta-agent can change the human prompt-tuning loop completely, iterating on agent.py in a single day with none human touching the harness recordsdata straight.
  • Benchmark outcomes validate the method — In a 24-hour run, AutoAgent hit #1 on SpreadsheetBench (96.5%) and the highest GPT-5 rating on TerminalBench (55.1%), beating each different entry that was hand-engineered by people.
  • ‘Mannequin empathy’ could also be an actual phenomenon — A Claude meta-agent optimizing a Claude process agent appeared to diagnose failures extra precisely than when optimizing a GPT-based agent, suggesting same-family mannequin pairing may matter when designing your AutoAgent loop.
  • The human’s job shifts from engineer to director — You don’t write or edit agent.py. You write program.md — a plain Markdown directive that steers the meta-agent. The excellence mirrors the broader shift in agentic engineering from writing code to setting targets.
  • It’s plug-and-play with any benchmark — As a result of duties observe Harbor’s open format and brokers run in Docker containers, AutoAgent is domain-agnostic. Any scorable process — spreadsheets, terminal instructions, or your individual customized area — can turn out to be a goal for autonomous self-optimization.

Take a look at the Repo and Tweet.  Additionally, be happy to observe us on Twitter and don’t overlook to affix our 120k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be part of us on telegram as nicely.

Must companion with us for selling your GitHub Repo OR Hugging Face Web page OR Product Launch OR Webinar and so on.? Join with us


Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s tendencies in the present day: learn extra, subscribe to our publication, and turn out to be a part of the NextTech neighborhood at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

Meta AI Releases EUPE: A Compact Imaginative and prescient Encoder Household Below 100M Parameters That Rivals Specialist Fashions Throughout Picture Understanding, Dense Prediction, and VLM Duties

April 7, 2026

An Implementation Information to Working NVIDIA Transformer Engine with Blended Precision, FP8 Checks, Benchmarking, and Fallback Execution

April 7, 2026

Why AMD’s MLPerf Breakthrough Alerts the Starting of the Finish for NVIDIA’s AI Monopoly

April 6, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

60pc of corporations might lay off workers that received’t undertake AI

By NextTechApril 7, 2026

The report exhibits that many organisations are dealing with vital challenges as they work to…

ModelBest Raises Funding, Enters USD 1 Billion+ Basis Mannequin Unicorn Tier

April 7, 2026

Irish co-founded Prism Layer out from stealth with $1m elevate

April 7, 2026
Top Trending

60pc of corporations might lay off workers that received’t undertake AI

By NextTechApril 7, 2026

The report exhibits that many organisations are dealing with vital challenges as…

ModelBest Raises Funding, Enters USD 1 Billion+ Basis Mannequin Unicorn Tier

By NextTechApril 7, 2026

ModelBest has accomplished a brand new funding spherical price a number of…

Irish co-founded Prism Layer out from stealth with $1m elevate

By NextTechApril 7, 2026

Fenway Summer season led the pre-seed spherical, with participation from Plural Ventures…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!