New ETH Zurich Examine Proves Your AI Coding Brokers Are Failing As A Result Of Your AGENTS.md Information Are Too Detailed

Within the high-stakes world of AI, ‘Context Engineering’ has emerged as the newest frontier for squeezing efficiency out of LLMs. Trade leaders have touted AGENTS.md (and its cousins like CLAUDE.md) as the final word configuration level for coding brokers—a repository-level ‘North Star’ injected into each dialog to information the AI by means of complicated codebases.

However a latest examine from researchers at ETH Zurich simply dropped an enormous actuality verify. The findings are fairly clear: should you aren’t deliberate together with your context information, you might be possible sabotaging your agent’s efficiency whereas paying a 20% premium for the privilege.

Screenshot 2026 02 25 at 4.25.26 PM 1 — https://arxiv.org/pdf/2602.11988

The Information: Extra Tokens, Much less Success

The ETH Zurich analysis staff analyzed coding brokers like Sonnet-4.5, GPT-5.2, and Qwen3-30B throughout established benchmarks and a novel set of real-world duties referred to as AGENTBENCH. The outcomes had been surprisingly lopsided:

The Auto-Generated Tax: Mechanically generated context information truly diminished success charges by roughly 3%.
The Price of ‘Assist‘: These information elevated inference prices by over 20% and necessitated extra reasoning steps to unravel the identical duties.
The Human Margin: Even human-written information solely supplied a marginal 4% efficiency achieve.
The Intelligence Cap: Curiously, utilizing stronger fashions (like GPT-5.2) to generate these information didn’t yield higher outcomes. Stronger fashions usually have sufficient ‘parametric data’ of widespread libraries that the additional context turns into redundant noise.

Why ‘Good’ Context Fails

The analysis staff highlights a behavioral entice: AI brokers are too obedient. Coding brokers are likely to respect the directions present in context information, however when these necessities are pointless, they make the duty more durable.

As an example, the researchers discovered that codebase overviews and listing listings—a staple of most AGENTS.md information—didn’t assist brokers navigate quicker. Brokers are surprisingly good at discovering file buildings on their very own; studying a guide itemizing simply consumes reasoning tokens and provides ‘psychological’ overhead. Moreover, LLM-generated information are sometimes redundant if you have already got respectable documentation elsewhere within the repo.

Screenshot 2026 02 25 at 4.26.00 PM 1 — https://arxiv.org/pdf/2602.11988

The New Guidelines of Context Engineering

To make context information truly useful, it is advisable to shift from ‘complete documentation’ to ‘surgical intervention.’

1. What to Embody (The ‘Very important Few’)

The Technical Stack & Intent: Clarify the ‘What’ and the ‘Why.’ Assist the agent perceive the aim of the venture and its structure (e.g., a monorepo construction).
Non-Apparent Tooling: That is the place AGENTS.md shines. Specify the right way to construct, take a look at, and confirm modifications utilizing particular instruments like uv as an alternative of pip or bun as an alternative of npm.
The Multiplier Impact: The information reveals that directions are adopted; instruments talked about in a context file are used considerably extra usually. For instance, the software uv was used 160x extra continuously (1.6 occasions per occasion vs. 0.01) when explicitly talked about.+1

2. What to Exclude (The ‘Noise’)

Detailed Listing Timber: Skip them. Brokers can discover the information they want with no map.
Type Guides: Don’t waste tokens telling an agent to “use camelCase.” Use deterministic linters and formatters as an alternative—they’re cheaper, quicker, and extra dependable.
Job-Particular Directions: Keep away from guidelines that solely apply to a fraction of your points.
Unvetted Auto-Content material: Don’t let an agent write its personal context file with no human assessment. The examine proves that ‘stronger’ fashions don’t essentially make higher guides.

3. Tips on how to Construction It

Hold it Lean: The overall consensus for high-performance context information is below 300 strains. Skilled groups usually hold theirs even tighter—below 60 strains. Each line counts as a result of each line is injected into each session.
Progressive Disclosure: Don’t put every little thing within the root file. Use the principle file to level the agent to separate, task-specific documentation (e.g., agent_docs/testing.md) solely when related.
Pointers Over Copies: As a substitute of embedding code snippets that can ultimately go stale, use pointers (e.g., file:line) to point out the agent the place to seek out design patterns or particular interfaces.

Key Takeaways

Damaging Impression of Auto-Technology: LLM-generated context information have a tendency to cut back process success charges by roughly 3% on common in comparison with offering no repository context in any respect.
Important Price Will increase: Together with context information will increase inference prices by over 20% and results in the next variety of steps required for brokers to finish duties.
Minimal Human Profit: Whereas human-written (developer-provided) context information carry out higher than auto-generated ones, they solely supply a marginal enchancment of about 4% over utilizing no context information.
Redundancy and Navigation: Detailed codebase overviews in context information are largely redundant with present documentation and don’t assist brokers discover related information any quicker.
Strict Instruction Following: Brokers usually respect the directions in these information, however pointless or overly restrictive necessities usually make fixing real-world duties more durable for the mannequin.

Try the Paper. Additionally, be happy to observe us on Twitter and don’t neglect to affix our 120k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be part of us on telegram as properly.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s traits immediately: learn extra, subscribe to our publication, and change into a part of the NextTech neighborhood at NextTech-news.com

What's Hot

MeltPlan raises $10M from Bessemer to deliver AI into building’s earliest choices

Construct an Elastic Vector Database with Constant Hashing, Sharding, and Reside Ring Visualization for RAG Methods

Transit businesses, bus firms tackle human trafficking

New ETH Zurich Examine Proves Your AI Coding Brokers are Failing As a result of Your AGENTS.md Information are too Detailed

Construct an Elastic Vector Database with Constant Hashing, Sharding, and Reside Ring Visualization for RAG Methods

How AI Will Influence the PropTech Business in 2026

Liquid AI’s New LFM2-24B-A2B Hybrid Structure Blends Consideration with Convolutions to Clear up the Scaling Bottlenecks of Trendy LLMs

MeltPlan raises $10M from Bessemer to deliver AI into building’s earliest choices

Construct an Elastic Vector Database with Constant Hashing, Sharding, and Reside Ring Visualization for RAG Methods

Transit businesses, bus firms tackle human trafficking

MeltPlan raises $10M from Bessemer to deliver AI into building’s earliest choices

Construct an Elastic Vector Database with Constant Hashing, Sharding, and Reside Ring Visualization for RAG Methods

Transit businesses, bus firms tackle human trafficking

What's Hot

New ETH Zurich Examine Proves Your AI Coding Brokers are Failing As a result of Your AGENTS.md Information are too Detailed

The Information: Extra Tokens, Much less Success

Why ‘Good’ Context Fails

The New Guidelines of Context Engineering

1. What to Embody (The ‘Very important Few’)

2. What to Exclude (The ‘Noise’)

3. Tips on how to Construction It

Key Takeaways

Related Posts

Subscribe For Latest Updates