Cerebras Releases MiniMax-M2-REAP-162B-A10B: A Reminiscence Environment Friendly Model Of MiniMax-M2 For Lengthy Context Coding Brokers

Cerebras has launched MiniMax-M2-REAP-162B-A10B, a compressed Sparse Combination-of-Consultants (SMoE) Causal Language Mannequin derived from MiniMax-M2, utilizing the brand new Router weighted Professional Activation Pruning (REAP) methodology. The mannequin retains the habits of the unique 230B whole, 10B lively MiniMax M2, whereas pruning consultants and decreasing reminiscence for deployment targeted workloads reminiscent of coding brokers and gear calling.

Structure and core specs

MiniMax-M2-REAP-162B-A10B has these key properties:

Base mannequin: MiniMax-M2
Compression methodology: REAP, Router weighted Professional Activation Pruning
Whole parameters: 162B
Energetic parameters per token: 10B
Layers: 62 transformer blocks
Consideration heads per layer: 48
Consultants: 180 consultants, obtained by pruning a 256 knowledgeable configuration
Activated consultants per token: 8
Context size: 196,608 tokens
License: modified MIT, derived from MiniMaxAI MiniMax M2

The SMoE design implies that the mannequin shops 162B parameters, however every token solely routes by a small set of consultants, so the efficient compute price per token is just like a 10B dense mannequin. MiniMax M2 itself is positioned as an MoE mannequin constructed for coding and agentic workflows, with 230B whole parameters and 10B lively, which this checkpoint inherits.

How REAP compresses MiniMax-M2?

MiniMax-M2-REAP-162B-A10B is created by making use of REAP uniformly throughout all MoE blocks of MiniMax M2, at a 30 p.c knowledgeable pruning charge.

The REAP methodology defines a saliency rating for every knowledgeable that mixes:

Router gate values: How usually and the way strongly the router selects that knowledgeable
Professional activation norms: The magnitude of the knowledgeable output when lively

Consultants that contribute minimally to the layer output, beneath this mixed criterion, are eliminated. The remaining consultants hold their authentic weights and the router retains separate gates for every of them. That is one shot compression, there is no such thing as a additional positive tuning after pruning within the methodology definition.

A core theoretical end result within the REAP’s analysis paper is that knowledgeable merging with summed gates causes purposeful subspace collapse. When consultants are merged, the router loses its unbiased, enter dependent management over these consultants, so a single merged knowledgeable should approximate an enter dependent combination that was initially expressed by a number of consultants. The analysis group proves that, at any time when the router coverage will depend on the enter and the consultants should not an identical, this introduces irreducible error. In distinction, pruning removes some consultants however preserves unbiased management of the survivors, so the error scales with the gate weight of the eliminated consultants.

Throughout a set of SMoE fashions within the 20B to 1T parameter vary, REAP persistently outperforms knowledgeable merging and different pruning standards on generative benchmarks reminiscent of code technology, mathematical reasoning and gear calling, particularly at 50 p.c compression.

Accuracy beneath 30 p.c knowledgeable pruning

The MiniMax-M2-REAP-162B-A10B mannequin will get in contrast on three checkpoints on normal coding, reasoning and agentic benchmarks:

MiniMax-M2 (230B, base mannequin)
MiniMax-M2-REAP-172B-A10B, 25 p.c pruning
MiniMax-M2-REAP-162B-A10B, 30 p.c pruning

Screenshot 2025 11 15 at 6.17.40 PM — https://huggingface.co/cerebras/MiniMax-M2-REAP-162B-A10B

On coding benchmarks reminiscent of HumanEval, HumanEval Plus, MBPP and MBPP Plus, the 162B REAP mannequin stays very near the bottom mannequin. HumanEval sits round 90% vary, and MBPP stays within the 80% vary, with the 172B and 162B fashions basically monitoring the unique MiniMax-M2 inside just a few factors.

On reasoning benchmarks reminiscent of AIME 25 and MATH 500, there are small shifts between the three fashions, however there is no such thing as a collapse at 30 p.c pruning and the 162B checkpoint stays aggressive with the bottom mannequin.

On software calling and agentic analysis, represented by τ² bench in a telecom setting, the 162B REAP mannequin once more matches the bottom mannequin inside small variance. The mannequin card explicitly states that this checkpoint retains virtually an identical efficiency whereas being about 30 p.c lighter in parameter depend.

These outcomes line up with the broader REAP examine, which studies close to lossless compression for code technology and gear calling on a number of massive SMoE architectures when pruning consultants utilizing the REAP criterion.

Deployment, reminiscence utilization and noticed throughput

Cerebras gives a direct vLLM serve instance and positions MiniMax-M2-REAP-162B-A10B as a drop in mannequin for the prevailing MiniMax M2 integration.

vllm serve cerebras/MiniMax-M2-REAP-162B-A10B 
    --tensor-parallel-size 8 
    --tool-call-parser minimax_m2 
    --reasoning-parser minimax_m2_append_think 
    --trust-remote-code 
    --enable_expert_parallel 
    --enable-auto-tool-choice

If the run hits reminiscence limits, the cardboard recommends decreasing --max-num-seqs, for instance to 64, to maintain batch measurement in test on a given GPU.

Key Takeaways

SMoE structure with environment friendly compute: MiniMax-M2-REAP-162B-A10B is a Sparse Combination of Consultants mannequin with 162B whole parameters and 10B lively parameters per token, so the compute price per token is near a 10B dense mannequin whereas conserving frontier scale capability.
REAP knowledgeable pruning retains habits of MiniMax-M2: The mannequin is produced by making use of REAP Router weighted Professional Activation Pruning to MiniMax-M2 at roughly 30 p.c knowledgeable pruning, pruning consultants primarily based on router gate values and knowledgeable activation norms whereas leaving surviving consultants and router construction intact.
Close to lossless accuracy at 30 p.c compression: On coding benchmarks reminiscent of HumanEval and MBPP, and on reasoning benchmarks reminiscent of AIME25 and MATH 500, the 162B REAP variant tracks the 230B MiniMax-M2 and a 172B REAP variant inside just a few factors, exhibiting close to lossless compression for code, reasoning and gear use.
Pruning outperforms knowledgeable merging for generative SMoE: The REAP examine reveals that pruning consultants utilizing a saliency criterion avoids the purposeful subspace collapse seen with knowledgeable merging in generative duties, and performs higher throughout massive SMoE fashions within the 22B to about 1T parameter vary.

Comparability Desk

Screenshot 2025 11 15 at 6.38.30 PM 1 — Picture supply: Marktechpost.com

Cerebras’ launch of MiniMax-M2-REAP-162B-A10B is a powerful sign that Router weighted Professional Activation Pruning is prepared for actual workloads, not simply as a analysis curiosity. The checkpoint reveals {that a} 30 p.c knowledgeable pruning schedule can hold MiniMax-M2 230B-A10B habits virtually intact whereas slicing reminiscence and preserving lengthy context coding, reasoning and gear calling efficiency, which is strictly what SMoE researchers want for sensible deployment. Total, Cerebras is quietly turning knowledgeable pruning into manufacturing infrastructure for frontier class SMoE fashions.

Take a look at the Mannequin Weights. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as nicely.

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

🙌 Observe MARKTECHPOST: Add us as a most well-liked supply on Google.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s tendencies as we speak: learn extra, subscribe to our publication, and grow to be a part of the NextTech group at NextTech-news.com

What's Hot

GMA T.50s Niki Lauda Delivers Monitor Pace That Leaves GT3 Automobiles Behind

Stryv acquires Sterra, eyes 9-figure income by 2027

Kenya’s taxman suits customs officers with bodycams to struggle graft

Cerebras Releases MiniMax-M2-REAP-162B-A10B: A Reminiscence Environment friendly Model of MiniMax-M2 for Lengthy Context Coding Brokers

ByteDance Releases DeerFlow 2.0: An Open-Supply SuperAgent Harness that Orchestrates Sub-Brokers, Reminiscence, and Sandboxes to do Complicated Duties

The best way to Construct a Danger-Conscious AI Agent with Inner Critic, Self-Consistency Reasoning, and Uncertainty Estimation for Dependable Resolution-Making

Anthropic Introduces Code Evaluation by way of Claude Code to Automate Advanced Safety Analysis Utilizing Superior Agentic Multi-Step Reasoning Loops

GMA T.50s Niki Lauda Delivers Monitor Pace That Leaves GT3 Automobiles Behind

Stryv acquires Sterra, eyes 9-figure income by 2027

Kenya’s taxman suits customs officers with bodycams to struggle graft

GMA T.50s Niki Lauda Delivers Monitor Pace That Leaves GT3 Automobiles Behind

Stryv acquires Sterra, eyes 9-figure income by 2027

Kenya’s taxman suits customs officers with bodycams to struggle graft

What's Hot

Cerebras Releases MiniMax-M2-REAP-162B-A10B: A Reminiscence Environment friendly Model of MiniMax-M2 for Lengthy Context Coding Brokers

Structure and core specs

How REAP compresses MiniMax-M2?

Accuracy beneath 30 p.c knowledgeable pruning

Deployment, reminiscence utilization and noticed throughput

Key Takeaways

Comparability Desk

Related Posts

Subscribe For Latest Updates