Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

GMA T.50s Niki Lauda Delivers Monitor Pace That Leaves GT3 Automobiles Behind

March 10, 2026

Stryv acquires Sterra, eyes 9-figure income by 2027

March 10, 2026

Kenya’s taxman suits customs officers with bodycams to struggle graft

March 10, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • GMA T.50s Niki Lauda Delivers Monitor Pace That Leaves GT3 Automobiles Behind
  • Stryv acquires Sterra, eyes 9-figure income by 2027
  • Kenya’s taxman suits customs officers with bodycams to struggle graft
  • Yann LeCun’s AI start-up AMI raises $1.3bn in seed funding
  • RØDE intros new {hardware} and new audio and video synchronization
  • CloudKeeper Licensed as a Nice Place to Work for the Second Consecutive Yr
  • Nigeria’s Web penetration hits file 53% in January
  • ByteDance Releases DeerFlow 2.0: An Open-Supply SuperAgent Harness that Orchestrates Sub-Brokers, Reminiscence, and Sandboxes to do Complicated Duties
Tuesday, March 10
NextTech NewsNextTech News
Home - AI & Machine Learning - Cerebras Releases MiniMax-M2-REAP-162B-A10B: A Reminiscence Environment friendly Model of MiniMax-M2 for Lengthy Context Coding Brokers
AI & Machine Learning

Cerebras Releases MiniMax-M2-REAP-162B-A10B: A Reminiscence Environment friendly Model of MiniMax-M2 for Lengthy Context Coding Brokers

NextTechBy NextTechNovember 16, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Cerebras Releases MiniMax-M2-REAP-162B-A10B: A Reminiscence Environment friendly Model of MiniMax-M2 for Lengthy Context Coding Brokers
Share
Facebook Twitter LinkedIn Pinterest Email


Cerebras has launched MiniMax-M2-REAP-162B-A10B, a compressed Sparse Combination-of-Consultants (SMoE) Causal Language Mannequin derived from MiniMax-M2, utilizing the brand new Router weighted Professional Activation Pruning (REAP) methodology. The mannequin retains the habits of the unique 230B whole, 10B lively MiniMax M2, whereas pruning consultants and decreasing reminiscence for deployment targeted workloads reminiscent of coding brokers and gear calling.

Structure and core specs

MiniMax-M2-REAP-162B-A10B has these key properties:

  • Base mannequin: MiniMax-M2
  • Compression methodology: REAP, Router weighted Professional Activation Pruning
  • Whole parameters: 162B
  • Energetic parameters per token: 10B
  • Layers: 62 transformer blocks
  • Consideration heads per layer: 48
  • Consultants: 180 consultants, obtained by pruning a 256 knowledgeable configuration
  • Activated consultants per token: 8
  • Context size: 196,608 tokens
  • License: modified MIT, derived from MiniMaxAI MiniMax M2

The SMoE design implies that the mannequin shops 162B parameters, however every token solely routes by a small set of consultants, so the efficient compute price per token is just like a 10B dense mannequin. MiniMax M2 itself is positioned as an MoE mannequin constructed for coding and agentic workflows, with 230B whole parameters and 10B lively, which this checkpoint inherits.

How REAP compresses MiniMax-M2?

MiniMax-M2-REAP-162B-A10B is created by making use of REAP uniformly throughout all MoE blocks of MiniMax M2, at a 30 p.c knowledgeable pruning charge.

The REAP methodology defines a saliency rating for every knowledgeable that mixes:

  • Router gate values: How usually and the way strongly the router selects that knowledgeable
  • Professional activation norms: The magnitude of the knowledgeable output when lively

Consultants that contribute minimally to the layer output, beneath this mixed criterion, are eliminated. The remaining consultants hold their authentic weights and the router retains separate gates for every of them. That is one shot compression, there is no such thing as a additional positive tuning after pruning within the methodology definition.

A core theoretical end result within the REAP’s analysis paper is that knowledgeable merging with summed gates causes purposeful subspace collapse. When consultants are merged, the router loses its unbiased, enter dependent management over these consultants, so a single merged knowledgeable should approximate an enter dependent combination that was initially expressed by a number of consultants. The analysis group proves that, at any time when the router coverage will depend on the enter and the consultants should not an identical, this introduces irreducible error. In distinction, pruning removes some consultants however preserves unbiased management of the survivors, so the error scales with the gate weight of the eliminated consultants.

Throughout a set of SMoE fashions within the 20B to 1T parameter vary, REAP persistently outperforms knowledgeable merging and different pruning standards on generative benchmarks reminiscent of code technology, mathematical reasoning and gear calling, particularly at 50 p.c compression.

Accuracy beneath 30 p.c knowledgeable pruning

The MiniMax-M2-REAP-162B-A10B mannequin will get in contrast on three checkpoints on normal coding, reasoning and agentic benchmarks:

  • MiniMax-M2 (230B, base mannequin)
  • MiniMax-M2-REAP-172B-A10B, 25 p.c pruning
  • MiniMax-M2-REAP-162B-A10B, 30 p.c pruning
Screenshot 2025 11 15 at 6.17.40 PM
https://huggingface.co/cerebras/MiniMax-M2-REAP-162B-A10B

On coding benchmarks reminiscent of HumanEval, HumanEval Plus, MBPP and MBPP Plus, the 162B REAP mannequin stays very near the bottom mannequin. HumanEval sits round 90% vary, and MBPP stays within the 80% vary, with the 172B and 162B fashions basically monitoring the unique MiniMax-M2 inside just a few factors.

On reasoning benchmarks reminiscent of AIME 25 and MATH 500, there are small shifts between the three fashions, however there is no such thing as a collapse at 30 p.c pruning and the 162B checkpoint stays aggressive with the bottom mannequin.

On software calling and agentic analysis, represented by τ2 bench in a telecom setting, the 162B REAP mannequin once more matches the bottom mannequin inside small variance. The mannequin card explicitly states that this checkpoint retains virtually an identical efficiency whereas being about 30 p.c lighter in parameter depend.

These outcomes line up with the broader REAP examine, which studies close to lossless compression for code technology and gear calling on a number of massive SMoE architectures when pruning consultants utilizing the REAP criterion.

Deployment, reminiscence utilization and noticed throughput

Cerebras gives a direct vLLM serve instance and positions MiniMax-M2-REAP-162B-A10B as a drop in mannequin for the prevailing MiniMax M2 integration.

vllm serve cerebras/MiniMax-M2-REAP-162B-A10B 
    --tensor-parallel-size 8 
    --tool-call-parser minimax_m2 
    --reasoning-parser minimax_m2_append_think 
    --trust-remote-code 
    --enable_expert_parallel 
    --enable-auto-tool-choice

If the run hits reminiscence limits, the cardboard recommends decreasing --max-num-seqs, for instance to 64, to maintain batch measurement in test on a given GPU.

Key Takeaways

  1. SMoE structure with environment friendly compute: MiniMax-M2-REAP-162B-A10B is a Sparse Combination of Consultants mannequin with 162B whole parameters and 10B lively parameters per token, so the compute price per token is near a 10B dense mannequin whereas conserving frontier scale capability.
  2. REAP knowledgeable pruning retains habits of MiniMax-M2: The mannequin is produced by making use of REAP Router weighted Professional Activation Pruning to MiniMax-M2 at roughly 30 p.c knowledgeable pruning, pruning consultants primarily based on router gate values and knowledgeable activation norms whereas leaving surviving consultants and router construction intact.
  3. Close to lossless accuracy at 30 p.c compression: On coding benchmarks reminiscent of HumanEval and MBPP, and on reasoning benchmarks reminiscent of AIME25 and MATH 500, the 162B REAP variant tracks the 230B MiniMax-M2 and a 172B REAP variant inside just a few factors, exhibiting close to lossless compression for code, reasoning and gear use.
  4. Pruning outperforms knowledgeable merging for generative SMoE: The REAP examine reveals that pruning consultants utilizing a saliency criterion avoids the purposeful subspace collapse seen with knowledgeable merging in generative duties, and performs higher throughout massive SMoE fashions within the 22B to about 1T parameter vary.

Comparability Desk

Screenshot 2025 11 15 at 6.38.30 PM 1Screenshot 2025 11 15 at 6.38.30 PM 1
Picture supply: Marktechpost.com

Cerebras’ launch of MiniMax-M2-REAP-162B-A10B is a powerful sign that Router weighted Professional Activation Pruning is prepared for actual workloads, not simply as a analysis curiosity. The checkpoint reveals {that a} 30 p.c knowledgeable pruning schedule can hold MiniMax-M2 230B-A10B habits virtually intact whereas slicing reminiscence and preserving lengthy context coding, reasoning and gear calling efficiency, which is strictly what SMoE researchers want for sensible deployment. Total, Cerebras is quietly turning knowledgeable pruning into manufacturing infrastructure for frontier class SMoE fashions.


Take a look at the Mannequin Weights. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as nicely.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

🙌 Observe MARKTECHPOST: Add us as a most well-liked supply on Google.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s tendencies as we speak: learn extra, subscribe to our publication, and grow to be a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

ByteDance Releases DeerFlow 2.0: An Open-Supply SuperAgent Harness that Orchestrates Sub-Brokers, Reminiscence, and Sandboxes to do Complicated Duties

March 10, 2026

The best way to Construct a Danger-Conscious AI Agent with Inner Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Dependable Resolution-Making

March 10, 2026

Anthropic Introduces Code Evaluation by way of Claude Code to Automate Advanced Safety Analysis Utilizing Superior Agentic Multi-Step Reasoning Loops

March 10, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

GMA T.50s Niki Lauda Delivers Monitor Pace That Leaves GT3 Automobiles Behind

By NextTechMarch 10, 2026

Gordon Murray’s objective was to create a race automobile for the monitor that prioritizes driver…

Stryv acquires Sterra, eyes 9-figure income by 2027

March 10, 2026

Kenya’s taxman suits customs officers with bodycams to struggle graft

March 10, 2026
Top Trending

GMA T.50s Niki Lauda Delivers Monitor Pace That Leaves GT3 Automobiles Behind

By NextTechMarch 10, 2026

Gordon Murray’s objective was to create a race automobile for the monitor…

Stryv acquires Sterra, eyes 9-figure income by 2027

By NextTechMarch 10, 2026

Sterra will proceed to function as a standalone model Singapore-based client electronics…

Kenya’s taxman suits customs officers with bodycams to struggle graft

By NextTechMarch 10, 2026

The Kenya Income Authority (KRA) is popping to surveillance expertise in its…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!