Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Circus CA-1 Could Change into the Robotic That Cooks Your Lunch and Sends the Employees Residence

November 10, 2025

AI Interview Sequence #1: Clarify Some LLM Textual content Technology Methods Utilized in LLMs

November 10, 2025

4 cities to obtain funding increase for brand spanking new local weather initiatives

November 9, 2025
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Circus CA-1 Could Change into the Robotic That Cooks Your Lunch and Sends the Employees Residence
  • AI Interview Sequence #1: Clarify Some LLM Textual content Technology Methods Utilized in LLMs
  • 4 cities to obtain funding increase for brand spanking new local weather initiatives
  • The important position of girls in robotics and the inaugural Ladies in Robotics Gala
  • Dwarf Galaxies Might Maintain the Solutions to the Debate on Darkish Matter
  • Weekly funding round-up! The entire European startup funding rounds we tracked this week (Nov. 03-07)
  • Why the Commonplace AirPods 4 Ship All the pieces Most Folks Want
  • Subsequent Wave of Stablecoin Growth Could Appear Invisible, Says Transak CEO
Monday, November 10
NextTech NewsNextTech News
Home - AI & Machine Learning - MiroMind-M1: Advancing Open-Supply Mathematical Reasoning through Context-Conscious Multi-Stage Reinforcement Studying
AI & Machine Learning

MiroMind-M1: Advancing Open-Supply Mathematical Reasoning through Context-Conscious Multi-Stage Reinforcement Studying

NextTechBy NextTechJuly 31, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
MiroMind-M1: Advancing Open-Supply Mathematical Reasoning through Context-Conscious Multi-Stage Reinforcement Studying
Share
Facebook Twitter LinkedIn Pinterest Email


Giant language fashions (LLMs) have just lately demonstrated outstanding progress in multi-step reasoning, establishing mathematical problem-solving as a rigorous benchmark for assessing superior capabilities. Whereas proprietary fashions like GPT-4o and Claude Sonnet 4 lead efficiency, their closed-source nature impedes transparency and reproducibility. Addressing these gaps, MiroMind AI Launched the MiroMind-M1 sequence, a completely open-source pipeline—spanning datasets, fashions, coaching code, and analysis scripts—that units new requirements for openness and state-of-the-art mathematical reasoning throughout the Qwen-2.5 mannequin ecosystem.

Architectural Basis and Motivation

MiroMind-M1 is constructed on the sturdy Qwen-2.5 spine, with enhancements geared explicitly for mathematical reasoning. The workforce adopts a two-stage coaching protocol:

  1. Supervised Tremendous-Tuning (SFT): The mannequin is fine-tuned on 719K fastidiously curated and verified mathematical issues, equipping it with robust step-by-step reasoning skills.
  2. Reinforcement Studying with Verifiable Rewards (RLVR): Subsequent, the mannequin undergoes RL on 62K difficult and rigorously verifiable math issues, leveraging reward indicators from a sturdy exterior verifier.

This method is motivated by each the necessity for robust mathematical logic and by the teachings realized from main RLMs: imitating chain-of-thought exemplars improves basic reasoning, whereas reinforcement studying, guided by exact rewards, additional refines accuracy and effectivity.

Knowledge Transparency and High quality

A trademark of the MiroMind-M1 challenge is the total openness and cleanliness of its coaching information:

  • SFT corpus composition: Attracts from OpenR1, OpenThoughts, Gentle-R1, and Artificial-1, guaranteeing issues have verified options and wealthy, multi-step reasoning traces.
  • Stringent deduplication and decontamination: Employs N-gram overlap filtering to get rid of duplication and information leakage with analysis units (e.g., AIME24, AIME25, MATH500).
  • Choice for lengthy trajectories: Experiments present that coaching on samples with longer reasoning traces constantly yields increased benchmark scores, highlighting the significance of deep semantic content material within the reasoning sign.

The ensuing dataset gives 719K verified coaching traces—considerably advancing open reproducible analysis over prior efforts.

Supervised Tremendous-Tuning: Empirical Excellence

For SFT, MiroMind-SFT-7B is initialized from Qwen2.5-Math-7B and skilled with a big context window (max 32,768 tokens) and a no-packing technique to keep away from cross-sample consideration contamination. Its efficiency on key math benchmarks outpaces peer open fashions:

Mannequin AIME24 AIME25 MATH500
DeepSeek-R1-Distill 55.5 40.4 92.8
MiMo-7B-SFT 58.7 44.3 93.0
MiroMind-SFT-7B 60.4 45.0 94.6

These outcomes validate the efficacy of the information curation and coaching design: richer, deeper samples and no-packing result in constantly superior efficiency.

CAMPO: Context-Conscious Multi-Stage Coverage Optimization

A key innovation in MiroMind-M1’s RLVR section is the CAMPO algorithm. CAMPO addresses two vital RL challenges—coaching instability and token inefficiency—by:

  • Multi-stage coaching with increasing context limits: Coaching begins with constrained output lengths (e.g., 16K tokens), then regularly will increase to permit deeper reasoning, balancing effectivity and thoroughness.
  • Dynamic repetition penalty: A devoted repetition critic penalizes outputs exhibiting early or extreme repetition, stopping utility collapse and imposing output variety.
  • Correct exterior verifier: The reward suggestions system is considerably improved to robustly rating math solutions (together with tough instances with items, π, and percentages), guaranteeing coaching indicators are tightly aligned with true correctness.

CAMPO not solely stabilizes RL dynamics but in addition ends in fashions that clear up issues with fewer, extra related tokens—accelerating inference and lowering prices with out sacrificing accuracy.

Benchmark Efficiency: State-of-the-Artwork Effectivity

MiroMind’s open fashions obtain extremely aggressive or state-of-the-art outcomes for open Qwen-2.5-based math fashions (7B/32B parameters):

Mannequin AIME24 AIME25 MATH500
DeepSeek-R1-7B 55.5 39.2 –
MiMo-7B-RL 68.2 55.4 95.8
Skywork-OR1-7B 72.2 54.6 –
MiroMind-RL-7B 73.4 57.8 96.7
Skywork-OR1-32B 77.1 68.2 97.5
MiroMind-RL-32B 77.5 65.6 96.4

Notably, MiroMind-M1-RL fashions not solely match or exceed peer accuracy, however achieve this with larger token effectivity—the 32B mannequin produces shorter, extra concise options with out lack of correctness, due to CAMPO’s coaching.

Full Stack and Reproducibility

Each element of the MiroMind-M1 stack is brazenly launched:

  • Mannequin weights (SFT and RL checkpoints for each 7B and 32B scales)
  • Datasets (full 719K SFT, 62K RLVR)
  • Coaching scripts (supporting multi-node distributed coaching on Ray)
  • Analysis code (standardized scripts and benchmark configs)

Researchers can replicate, audit, and lengthen MiroMind-M1 from uncooked information to skilled fashions, advancing reproducibility and accelerating new open analysis.

Conclusion

MiroMind-M1 demonstrates that with cautious information curation, revolutionary RL algorithms (CAMPO), and radical transparency, open-source language fashions can rival proprietary programs in superior mathematical reasoning. This challenge units a brand new bar for reproducibility and collaborative development in reasoning LLMs, offering each a high-quality useful resource and a sturdy platform for future innovation.


Try the Paper, GitHub Web page and Mannequin on Hugging Face. All credit score for this analysis goes to the researchers of this challenge. Additionally, be at liberty to observe us on Twitter and don’t overlook to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter.


Nikhil is an intern advisor at Marktechpost. He’s pursuing an built-in twin diploma in Supplies on the Indian Institute of Know-how, Kharagpur. Nikhil is an AI/ML fanatic who’s all the time researching functions in fields like biomaterials and biomedical science. With a robust background in Materials Science, he’s exploring new developments and creating alternatives to contribute.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s developments right now: learn extra, subscribe to our publication, and turn out to be a part of the NextTech neighborhood at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

AI Interview Sequence #1: Clarify Some LLM Textual content Technology Methods Utilized in LLMs

November 10, 2025

StepFun AI Releases Step-Audio-EditX: A New Open-Supply 3B LLM-Grade Audio Enhancing Mannequin Excelling at Expressive and Iterative Audio Enhancing

November 9, 2025

How you can Construct an Agentic Voice AI Assistant that Understands, Causes, Plans, and Responds via Autonomous Multi-Step Intelligence

November 9, 2025
Add A Comment
Leave A Reply Cancel Reply

Economy News

Circus CA-1 Could Change into the Robotic That Cooks Your Lunch and Sends the Employees Residence

By NextTechNovember 10, 2025

Circus SE out of Munich constructed a robotic referred to as the CA-1 that sits…

AI Interview Sequence #1: Clarify Some LLM Textual content Technology Methods Utilized in LLMs

November 10, 2025

4 cities to obtain funding increase for brand spanking new local weather initiatives

November 9, 2025
Top Trending

Circus CA-1 Could Change into the Robotic That Cooks Your Lunch and Sends the Employees Residence

By NextTechNovember 10, 2025

Circus SE out of Munich constructed a robotic referred to as the…

AI Interview Sequence #1: Clarify Some LLM Textual content Technology Methods Utilized in LLMs

By NextTechNovember 10, 2025

Each time you immediate an LLM, it doesn’t generate a whole reply…

4 cities to obtain funding increase for brand spanking new local weather initiatives

By NextTechNovember 9, 2025

Cartagena is without doubt one of the cities to obtain funding in…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!