Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Conquest Reforged Rewrites the Guidelines on an RTX 5090 to Reveal Minecraft’s Hidden World

January 19, 2026

How a small oil unit turned a gradual livelihood in Ghazipur district

January 19, 2026

The perimeter and the drain

January 19, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Conquest Reforged Rewrites the Guidelines on an RTX 5090 to Reveal Minecraft’s Hidden World
  • How a small oil unit turned a gradual livelihood in Ghazipur district
  • The perimeter and the drain
  • Razer’s Mission Ava: AI now goes in a cannister in your desk
  • Dublin’s PRM Help baggage €500,000 to enhance airport accessibility
  • Unlawful eScooter was able to 99km/hr
  • Feishu Companions with Anker Improvements to Launch an “AI Recording Bean”
  • 👨🏿‍🚀TechCabal Each day – MAXimum recharge
Monday, January 19
NextTech NewsNextTech News
Home - AI & Machine Learning - Nous Analysis Releases NousCoder-14B: A Aggressive Olympiad Programming Mannequin Publish-Skilled on Qwen3-14B by way of Reinforcement Studying
AI & Machine Learning

Nous Analysis Releases NousCoder-14B: A Aggressive Olympiad Programming Mannequin Publish-Skilled on Qwen3-14B by way of Reinforcement Studying

NextTechBy NextTechJanuary 19, 2026No Comments7 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Nous Analysis Releases NousCoder-14B: A Aggressive Olympiad Programming Mannequin Publish-Skilled on Qwen3-14B by way of Reinforcement Studying
Share
Facebook Twitter LinkedIn Pinterest Email


Nous Analysis has launched NousCoder-14B, a aggressive olympiad programming mannequin that’s submit skilled on Qwen3-14B utilizing reinforcement studying (RL) with verifiable rewards. On the LiveCodeBench v6 benchmark, which covers issues from 08/01/2024 to 05/01/2025, the mannequin reaches a Move@1 accuracy of 67.87 p.c. That is 7.08 proportion factors greater than the Qwen3-14B baseline of 60.79 p.c on the identical benchmark. The analysis workforce skilled the mannequin on 24k verifiable coding issues utilizing 48 B200 GPUs over 4 days, and launched the weights below the Apache 2.0 license on Hugging Face.

Screenshot 2026 01 18 at 9.26.04 PM 1
https://nousresearch.com/nouscoder-14b-a-competitive-olympiad-programming-model/

Benchmark focus and what Move@1 means

LiveCodeBench v6 is designed for aggressive programming analysis. The take a look at cut up used right here accommodates 454 issues. The coaching set makes use of the identical recipe because the DeepCoder-14B challenge from Agentica and Collectively AI. It combines issues from TACO Verified, PrimeIntellect SYNTHETIC 1, and LiveCodeBench issues created earlier than 07/31/2024.

The benchmark solely contains aggressive programming model duties. For every downside, an answer should respect strict time and reminiscence limits and should cross a big set of hidden enter output checks. Move@1 is the fraction of issues the place the primary generated program passes all checks, together with time and reminiscence constraints.

Screenshot 2026 01 18 at 9.27.09 PM 1Screenshot 2026 01 18 at 9.27.09 PM 1
https://nousresearch.com/nouscoder-14b-a-competitive-olympiad-programming-model/

Dataset building for execution primarily based RL

All datasets used for coaching are composed of verifiable code era issues. Every downside has a reference implementation and lots of take a look at instances. The coaching set accommodates 24k issues drawn from:

  • TACO Verified
  • PrimeIntellect SYNTHETIC 1
  • LiveCodeBench issues that come earlier than 07/31/2024

The take a look at set is LiveCodeBench v6, which has 454 issues between 08/01/2024 and 05/01/2025.

Each downside is an entire aggressive programming process with an outline, enter format, output format, and take a look at instances. This setup is necessary for RL as a result of it provides a binary reward sign that’s low-cost to compute as soon as the code has run.

RL setting with Atropos and Modal

The RL setting is constructed utilizing the Atropos framework. NousCoder-14B is prompted utilizing the usual LiveCodeBench immediate format, and it generates Python code for every downside. Every rollout receives a scalar reward that will depend on take a look at case outcomes:

  • Reward 1 when the generated code passes all take a look at instances for that downside
  • Reward −1 when the code outputs a incorrect reply, exceeds a 15 second time restrict, or exceeds a 4 GB reminiscence restrict on any take a look at case

To execute untrusted code safely and at scale, the workforce makes use of Modal as an autoscaled sandbox. The system launches one Modal container per rollout in the principle design that the analysis workforce describes because the used setting. Every container runs all take a look at instances for that rollout. This avoids mixing coaching compute with verification compute and retains the RL loop steady.

The analysis workforce additionally pipelines inference and verification. When an inference employee finishes a era, it sends the completion to a Modal verifier and instantly begins a brand new era. With many inference staff and a hard and fast pool of Modal containers, this design retains the coaching loop inference compute sure as a substitute of verification sure.

The workforce discusses 3 verification parallelization methods. They discover one container per downside, one per rollout, and one per take a look at case. They lastly keep away from the per take a look at case setting due to container launch overhead and use an method the place every container evaluates many take a look at instances and focuses on a small set of the toughest take a look at instances first. If any of those fail, the system can cease verification early.

GRPO aims, DAPO, GSPO, and GSPO+

NousCoder-14B makes use of Group Relative Coverage Optimization (GRPO) which doesn’t require a separate worth mannequin. On prime of GRPO the analysis workforce take a look at 3 aims: Dynamic sAmpling Coverage Optimization (DAPO), Group Sequence Coverage Optimization (GSPO), and a modified GSPO variant known as GSPO+.

All 3 aims share the identical definition of benefit. The benefit for every rollout is the reward for that rollout normalized by the imply and normal deviation of rewards contained in the group. DAPO applies significance weighting and clipping on the token degree, and introduces three principal modifications relative to GRPO:

  • A clip greater rule that will increase exploration for low likelihood tokens
  • A token degree coverage gradient loss that offers every token equal weight
  • Dynamic sampling, the place teams which are all appropriate or all incorrect are dropped as a result of they carry zero benefit

GSPO strikes the significance weighting to the sequence degree. It defines a sequence significance ratio that aggregates token ratios over the entire program. GSPO+ retains sequence degree correction, however it rescales gradients in order that tokens are weighted equally no matter sequence size.

On LiveCodeBench v6, the variations between these aims are modest. At a context size of 81,920 tokens, DAPO reaches a Move@1 of 67.87 p.c whereas GSPO and GSPO+ attain 66.26 p.c and 66.52 p.c. At 40,960 tokens, all 3 aims cluster round 63 p.c Move@1.

Iterative context extension and overlong filtering

Qwen3-14B helps lengthy context and the coaching follows an iterative context extension schedule. The workforce first trains the mannequin with a 32k context window after which continues coaching on the most Qwen3-14B context window of 40k. At every stage they choose the checkpoint with one of the best LiveCodeBench rating at 40k context after which use YaRN context extension at analysis time to achieve 80k tokens, that’s 81,920 tokens.

A key trick is overlong filtering. When a generated program exceeds the utmost context window, they reset its benefit to zero. This removes that rollout from the gradient sign quite than penalizing it. The analysis workforce report that this method avoids pushing the mannequin towards shorter options for purely optimization causes and helps keep high quality after they scale context size at take a look at time.

Key Takeaways

  • NousCoder 14B is a Qwen3-14B primarily based aggressive programming mannequin skilled with execution primarily based RL, it reaches 67.87 p.c Move@1 on LiveCodeBench v6, a 7.08 proportion level achieve over the Qwen3-14B baseline of 60.79 p.c on the identical benchmark.
  • The mannequin is skilled on 24k verifiable coding issues from TACO Verified, PrimeIntellect SYNTHETIC-1, and pre 07 31 2024 LiveCodeBench duties, and evaluated on a disjoint LiveCodeBench v6 take a look at set of 454 issues from 08/01/2024 to 05/01/2025.
  • The RL setup makes use of Atropos, with Python options executed in sandboxed containers, a easy reward of 1 for fixing all take a look at instances and minus 1 for any failure or useful resource restrict breach, and a pipelined design the place inference and verification run asynchronously.
  • Group Relative Coverage Optimization aims DAPO, GSPO, and GSPO+ are used for lengthy context code RL, all function on group normalized rewards, and present comparable efficiency, with DAPO reaching one of the best Move@1 on the longest 81,920 token context.
  • The coaching makes use of iterative context extension, first at 32k then at 40k tokens, together with YaRN primarily based extension at analysis time to 81,920 tokens, contains overlong rollout filtering for stability, and ships as a totally reproducible open stack with Apache 2.0 weights and RL pipeline code.

Take a look at the Mannequin Weights and Technical particulars. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be part of us on telegram as nicely.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s tendencies in the present day: learn extra, subscribe to our publication, and develop into a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

A Coding Information to Understanding How Retries Set off Failure Cascades in RPC and Occasion-Pushed Architectures

January 19, 2026

Vercel Releases Agent Abilities: A Package deal Supervisor For AI Coding Brokers With 10 Years of React and Subsequent.js Optimisation Guidelines

January 18, 2026

NVIDIA Releases PersonaPlex-7B-v1: A Actual-Time Speech-to-Speech Mannequin Designed for Pure and Full-Duplex Conversations

January 18, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

Conquest Reforged Rewrites the Guidelines on an RTX 5090 to Reveal Minecraft’s Hidden World

By NextTechJanuary 19, 2026

Once you hearth up that beast of a graphics card, the RTX 5090, Conquest Reforged…

How a small oil unit turned a gradual livelihood in Ghazipur district

January 19, 2026

The perimeter and the drain

January 19, 2026
Top Trending

Conquest Reforged Rewrites the Guidelines on an RTX 5090 to Reveal Minecraft’s Hidden World

By NextTechJanuary 19, 2026

Once you hearth up that beast of a graphics card, the RTX…

How a small oil unit turned a gradual livelihood in Ghazipur district

By NextTechJanuary 19, 2026

Raju Maurya lives in Roopur Chhaavni Line in Ghazipur district, Uttar Pradesh,…

The perimeter and the drain

By NextTechJanuary 19, 2026

First revealed 18 Jan, 2026 To grasp the present state of digital…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!