Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

LimX COSA (Cognitive OS of Brokers) Guarantees to Give Humanoid Robots the Potential to Purpose within the Actual World

January 15, 2026

SK Telecom Defends A.X K1 as Korea’s Nationwide AI Mission Faces Scrutiny

January 15, 2026

Google launches Gemini Private Intelligence within the U.S.

January 15, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • LimX COSA (Cognitive OS of Brokers) Guarantees to Give Humanoid Robots the Potential to Purpose within the Actual World
  • SK Telecom Defends A.X K1 as Korea’s Nationwide AI Mission Faces Scrutiny
  • Google launches Gemini Private Intelligence within the U.S.
  • Canberra empowers neighborhood local weather motion
  • 4 Privately Funded Observatories within the Subsequent Three Years
  • Curtains for SXSW Sydney: Organisers pull 2026 occasion
  • OpenAI makes main foray into the healthcare sector
  • Helix Alpha Techniques Ltd Pronounces Function as Quantitative Analysis and Techniques Engineering Agency With Brian Ferdinand as Head
Thursday, January 15
NextTech NewsNextTech News
Home - AI & Machine Learning - Meta and Harvard Researchers Introduce the Confucius Code Agent (CCA): A Software program Engineering Agent that may Function at Giant-Scale Codebases
AI & Machine Learning

Meta and Harvard Researchers Introduce the Confucius Code Agent (CCA): A Software program Engineering Agent that may Function at Giant-Scale Codebases

NextTechBy NextTechJanuary 9, 2026No Comments7 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Meta and Harvard Researchers Introduce the Confucius Code Agent (CCA): A Software program Engineering Agent that may Function at Giant-Scale Codebases
Share
Facebook Twitter LinkedIn Pinterest Email


How far can a mid sized language mannequin go if the actual innovation strikes from the spine into the agent scaffold and gear stack? Meta and Harvard researchers have launched the Confucius Code Agent, an open sourced AI software program engineer constructed on the Confucius SDK that’s designed for industrial scale software program repositories and lengthy operating periods. The system targets actual GitHub tasks, advanced check toolchains at analysis time, and reproducible outcomes on benchmarks comparable to SWE Bench Professional and SWE Bench Verified, whereas exposing the total scaffold for builders.

Screenshot 2026 01 09 at 7.32.43 AM 1
https://arxiv.org/pdf/2512.10398

Confucius SDK, scaffolding across the mannequin

The Confucius SDK is an agent improvement platform that treats scaffolding as a major design downside fairly than a skinny wrapper round a language mannequin. It’s organized round 3 axes, Agent Expertise, Consumer Expertise, and Developer Expertise.

Agent Expertise controls what the mannequin sees, together with context structure, working reminiscence and gear outcomes. Consumer Expertise focuses on readable traces, code diffs and safeguards for human engineers. Developer Expertise focuses on observability, configuration and debugging of the agent itself.

The SDK introduces 3 core mechanisms, a unified orchestrator with hierarchical working reminiscence, a persistent observe taking system, and a modular extension interface for instruments. A meta agent then automates synthesis and refinement of agent configurations by a construct, check, enhance loop. The Confucius Code Agent is one concrete instantiation of this scaffold for software program engineering.

Screenshot 2026 01 09 at 7.33.46 AM 1Screenshot 2026 01 09 at 7.33.46 AM 1
https://arxiv.org/pdf/2512.10398

Hierarchical working reminiscence for lengthy horizon coding

Actual software program duties on SWE Bench Professional typically require reasoning over dozens of recordsdata and plenty of interplay steps. The orchestrator in Confucius SDK maintains hierarchical working reminiscence, which partitions a trajectory into scopes, summarizes previous steps and retains compressed context for later turns.

This design helps hold prompts inside mannequin context limits whereas preserving essential artifacts comparable to patches, error logs and design choices. The important thing level is that efficient device primarily based coding brokers want an specific reminiscence structure, not only a sliding window of earlier messages.

Persistent observe taking for cross session studying

The second mechanism is a observe taking system that makes use of a devoted agent to put in writing structured Markdown notes from execution traces. These notes seize job particular methods, repository conventions and customary failure modes, and they’re saved as long run reminiscence that may be reused throughout periods.

The analysis crew ran Confucius Code Agent twice on 151 SWE Bench Professional cases with Claude 4.5 Sonnet. On the primary run the agent solves duties from scratch and generates notes. On the second run the agent reads these notes. On this setting, common turns drop from 64 to 61, token utilization drops from about 104k to 93k, and Resolve@1 improves from 53.0 to 54.4. This exhibits that notes usually are not simply logs, they operate as efficient cross session reminiscence.

Modular extensions and gear use sophistication

Confucius SDK exposes instruments as extensions, for instance file modifying, command execution, check runners and code search. Every extension can keep its personal state and immediate wiring.

The analysis crew research the influence of device use sophistication utilizing an ablation on a 100 instance subset of SWE Bench Professional. With Claude 4 Sonnet, transferring from a configuration with out superior context options to 1 with superior context raises Resolve@1 from 42.0 to 48.6. With Claude 4.5 Sonnet, a easy device use configuration reaches 44.0, whereas richer device dealing with reaches 51.6, with 51.0 for an intermediate variant. These numbers point out that how the agent chooses and sequences instruments issues nearly as a lot because the spine mannequin alternative.

Screenshot 2026 01 09 at 7.34.24 AM 1Screenshot 2026 01 09 at 7.34.24 AM 1
https://arxiv.org/pdf/2512.10398

Meta agent for computerized agent design

On prime of those mechanisms, the Confucius SDK features a meta agent that takes a pure language specification of an agent and iteratively proposes configurations, prompts and extension units. It then runs the candidate agent on duties, inspects traces and metrics, and edits the configuration in a construct, check, enhance loop.

The Confucius Code Agent that the analysis crew evaluates is produced with the assistance of this meta agent, fairly than solely hand tuned. This strategy turns a number of the agent engineering course of itself into an LLM guided optimization downside.

Outcomes on SWE Bench Professional and SWE Bench Verified

The principle analysis makes use of SWE Bench Professional, which has 731 GitHub points that require modifying actual repositories till exams go. All in contrast techniques share the identical repositories, device setting and analysis harness, so variations come from the scaffolds and fashions.

On SWE Bench Professional, the reported Resolve@1 scores are

  • Claude 4 Sonnet with SWE Agent, 42.7
  • Claude 4 Sonnet with Confucius Code Agent, 45.5
  • Claude 4.5 Sonnet with SWE Agent, 43.6
  • Claude 4.5 Sonnet with Reside SWE Agent, 45.8
  • Claude 4.5 Sonnet with Confucius Code Agent, 52.7
  • Claude 4.5 Opus with Anthropic system card scaffold, 52.0
  • Claude 4.5 Opus with Confucius Code Agent, 54.3

These outcomes present {that a} sturdy scaffold with a mid tier mannequin, Claude 4.5 Sonnet with Confucius Code Agent at 52.7, can outperform a stronger mannequin with a weaker scaffold, Claude 4.5 Opus with 52.0.

On SWE Bench Verified, Confucius Code Agent with Claude 4 Sonnet reaches Resolve@1 74.6, in comparison with 66.6 for SWE Agent and 72.8 for OpenHands. A mini SWE Agent variant with Claude 4.5 Sonnet reaches 70.6, which can be beneath Confucius Code Agent with Claude 4 Sonnet.

The analysis crew additionally report efficiency as a operate of edited file depend. For duties modifying 1 to 2 recordsdata, Confucius Code Agent reaches 57.8 Resolve@1, for 3 to 4 recordsdata it reaches 49.2, for five to six recordsdata it reaches 44.1, for 7 to 10 recordsdata it reaches 52.6, and for greater than 10 recordsdata it reaches 44.4. This means secure conduct on multi file adjustments in giant codebases.

Key Takeaways

  • Scaffolding can outweigh mannequin dimension: Confucius Code Agent exhibits that with sturdy scaffolding, Claude 4.5 Sonnet reaches 52.7 Resolve@1 on SWE-Bench-Professional, surpassing Claude 4.5 Opus with a weaker scaffold at 52.0.
  • Hierarchical working reminiscence is crucial for lengthy horizon coding: The Confucius SDK orchestrator makes use of hierarchical working reminiscence and context compression to handle lengthy trajectories over giant repositories, fairly than counting on a easy rolling historical past.
  • Persistent notes act as efficient cross session reminiscence: On 151 SWE-Bench-Professional duties with Claude 4.5 Sonnet, reusing structured notes reduces turns from 64 to 61, token utilization from about 104k to 93k, and will increase Resolve@1 from 53.0 to 54.4.
  • Software configuration materially impacts success charges: On a 100 job SWE-Bench-Professional subset, transferring from easy to richer device dealing with with Claude 4.5 Sonnet will increase Resolve@1 from 44.0 to 51.6, indicating that realized device routing and restoration methods are a serious efficiency lever, not simply an implementation element.
  • Meta agent automates agent design and tuning: A meta agent iteratively proposes prompts, device units and configurations, then evaluates and edits them in a construct, check, enhance loop, and the manufacturing Confucius Code Agent is itself generated with this course of fairly than solely guide tuning.

Take a look at the PAPER HERE. Additionally, be at liberty to comply with us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be part of us on telegram as effectively.

Take a look at our newest launch of ai2025.dev, a 2025-focused analytics platform that turns mannequin launches, benchmarks, and ecosystem exercise right into a structured dataset you’ll be able to filter, examine, and export.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s tendencies right now: learn extra, subscribe to our publication, and turn out to be a part of the NextTech neighborhood at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

Methods to Construct a Stateless, Safe, and Asynchronous MCP-Type Protocol for Scalable Agent Workflows

January 14, 2026

Google AI Releases MedGemma-1.5: The Newest Replace to their Open Medical AI Fashions for Builders

January 14, 2026

Understanding the Layers of AI Observability within the Age of LLMs

January 13, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

LimX COSA (Cognitive OS of Brokers) Guarantees to Give Humanoid Robots the Potential to Purpose within the Actual World

By NextTechJanuary 15, 2026

LimX Dynamics has unveiled a product that has the potential to revolutionize how robots work…

SK Telecom Defends A.X K1 as Korea’s Nationwide AI Mission Faces Scrutiny

January 15, 2026

Google launches Gemini Private Intelligence within the U.S.

January 15, 2026
Top Trending

LimX COSA (Cognitive OS of Brokers) Guarantees to Give Humanoid Robots the Potential to Purpose within the Actual World

By NextTechJanuary 15, 2026

LimX Dynamics has unveiled a product that has the potential to revolutionize…

SK Telecom Defends A.X K1 as Korea’s Nationwide AI Mission Faces Scrutiny

By NextTechJanuary 15, 2026

Dispute highlights rising uncertainty over “from-scratch” requirements in Korea’s flagship AI initiative…

Google launches Gemini Private Intelligence within the U.S.

By NextTechJanuary 15, 2026

Google is launching Private Intelligence in beta, making Gemini extra private, proactive…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!