Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

LightWheel Raises $145 Million, Creating the World’s First Embodied Knowledge Unicorn

March 13, 2026

Samsung Galaxy S26 Extremely Turns Your Pocket Right into a Full Workstation

March 13, 2026

Alphamab Oncology Declares IND Utility for Modern EGFR/HER3 Twin Payload Bispecific ADC JSKN021 was Formally Accepted by CDE

March 13, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • LightWheel Raises $145 Million, Creating the World’s First Embodied Knowledge Unicorn
  • Samsung Galaxy S26 Extremely Turns Your Pocket Right into a Full Workstation
  • Alphamab Oncology Declares IND Utility for Modern EGFR/HER3 Twin Payload Bispecific ADC JSKN021 was Formally Accepted by CDE
  • Trump administration unveils new plan for some homeless veterans: authorized guardianship
  • It took a pair years, however I lastly warmed as much as the PlayStation Portal
  • MassRobotics, AWS, and NVIDIA Announce Second Cohort of Bodily AI Fellowship
  • Y Combinator-backed Random Labs launches Slate V1, claiming the primary 'swarm-native' coding agent
  • New Well being Knowledge Sort Assist in Samsung Well being Knowledge SDK
Friday, March 13
NextTech NewsNextTech News
Home - AI & Machine Learning - NVIDIA AI Releases Nemotron Nano 2 AI Fashions: A Manufacturing-Prepared Enterprise AI Mannequin Household and 6x Quicker than Related Sized Mannequin
AI & Machine Learning

NVIDIA AI Releases Nemotron Nano 2 AI Fashions: A Manufacturing-Prepared Enterprise AI Mannequin Household and 6x Quicker than Related Sized Mannequin

NextTechBy NextTechAugust 19, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
NVIDIA AI Releases Nemotron Nano 2 AI Fashions: A Manufacturing-Prepared Enterprise AI Mannequin Household and 6x Quicker than Related Sized Mannequin
Share
Facebook Twitter LinkedIn Pinterest Email


NVIDIA has unveiled the Nemotron Nano 2 household, introducing a line of hybrid Mamba-Transformer massive language fashions (LLMs) that not solely push state-of-the-art reasoning accuracy but in addition ship as much as 6× increased inference throughput than fashions of comparable dimension. This launch stands out with unprecedented transparency in knowledge and methodology, as NVIDIA gives many of the coaching corpus and recipes alongside mannequin checkpoints for the neighborhood. Critically, these fashions keep huge 128K-token context functionality on a single midrange GPU, considerably decreasing limitations for long-context reasoning and real-world deployment.

Key Highlights

  • 6× throughput vs. equally sized fashions: Nemotron Nano 2 fashions ship as much as 6.3× the token technology pace of fashions like Qwen3-8B in reasoning-heavy eventualities—with out sacrificing accuracy.
  • Superior accuracy for reasoning, coding & multilingual duties: Benchmarks present on-par or higher outcomes vs. aggressive open fashions, notably exceeding friends in math, code, software use, and long-context duties.
  • 128K context size on a single GPU: Environment friendly pruning and hybrid structure make it potential to run 128,000 token inference on a single NVIDIA A10G GPU (22GiB).
  • Open knowledge & weights: Many of the pretraining and post-training datasets, together with code, math, multilingual, artificial SFT, and reasoning knowledge, are launched with permissive licensing on Hugging Face.
Screenshot 2025 08 19 at 10.31.09 AM 1

Hybrid Structure: Mamba Meets Transformer

Nemotron Nano 2 is constructed on a hybrid Mamba-Transformer spine, impressed by the Nemotron-H Structure. Most conventional self-attention layers are changed by environment friendly Mamba-2 layers, with solely about 8% of the full layers utilizing self-attention. This structure is fastidiously crafted:

  • Mannequin Particulars: The 9B-parameter mannequin options 56 layers (out of a pre-trained 62), a hidden dimension of 4480, with grouped-query consideration and Mamba-2 state area layers facilitating each scalability and lengthy sequence retention.
  • Mamba-2 Improvements: These state-space layers, just lately popularized as high-throughput sequence fashions, are interleaved with sparse self-attention (to protect long-range dependencies), and huge feed-forward networks.

This construction allows excessive throughput on reasoning duties requiring “pondering traces”—lengthy generations based mostly on lengthy, in-context enter—the place conventional transformer-based architectures usually decelerate or run out of reminiscence.

Screenshot 2025 08 19 at 10.31.32 AM 1Screenshot 2025 08 19 at 10.31.32 AM 1

Coaching Recipe: Huge Information Variety, Open Sourcing

Nemotron Nano 2 fashions are skilled and distilled from a 12B parameter instructor mannequin utilizing an intensive, high-quality corpus. NVIDIA’s unprecedented knowledge transparency is a spotlight:

  • 20T tokens pretraining: Information sources embrace curated and artificial corpora for net, math, code, multilingual, tutorial, and STEM domains.
  • Main Datasets Launched:
    • Nemotron-CC-v2: Multilingual net crawl (15 languages), artificial Q&A rephrasing, deduplication.
    • Nemotron-CC-Math: 133B tokens of math content material, standardized to LaTeX, over 52B “highest high quality” subset.
    • Nemotron-Pretraining-Code: Curated and quality-filtered GitHub supply code; rigorous decontamination and deduplication.
    • Nemotron-Pretraining-SFT: Artificial, instruction-following datasets throughout STEM, reasoning, and common domains.
  • Submit-training Information: Consists of over 80B tokens of supervised fine-tuning (SFT), RLHF, tool-calling, and multilingual datasets—most of that are open-sourced for direct reproducibility.

Alignment, Distillation, and Compression: Unlocking Value-Efficient, Lengthy-Context Reasoning

NVIDIA’s mannequin compression course of is constructed on the “Minitron” and Mamba pruning frameworks:

  • Data distillation from the 12B instructor reduces the mannequin to 9B parameters, with cautious pruning of layers, FFN dimensions, and embedding width.
  • Multi-stage SFT and RL: Consists of tool-calling optimization (BFCL v3), instruction-following (IFEval), DPO and GRPO reinforcement, and “pondering finances” management (assist for controllable reasoning-token budgets at inference).
  • Reminiscence-targeted NAS: By way of structure search, the pruned fashions are particularly engineered in order that the mannequin and key-value cache each match—and stay performant—inside the A10G GPU reminiscence at a 128k context size.

The consequence: inference speeds of as much as 6× quicker than open opponents in eventualities with massive enter/output tokens, with out compromised activity accuracy.

Benchmarking: Superior Reasoning and Multilingual Capabilities

In head-to-head evaluations, Nemotron Nano 2 fashions excel:

Process/Bench Nemotron-Nano-9B-v2 Qwen3-8B Gemma3-12B
MMLU (Common) 74.5 76.4 73.6
MMLU-Professional (5-shot) 59.4 56.3 45.1
GSM8K CoT (Math) 91.4 84.0 74.5
MATH 80.5 55.4 42.4
HumanEval+ 58.5 57.6 36.7
RULER-128K (Lengthy Context) 82.2 – 80.7
World-MMLU-Lite (Avg Multi) 69.9 72.8 71.9
MGSM Multilingual Math (Avg) 84.8 64.5 57.1
  • Throughput (tokens/s/GPU) at 8k enter/16k output:
    • Nemotron-Nano-9B-v2: as much as 6.3× Qwen3-8B in reasoning traces.
    • Maintains as much as 128k-context with batch dimension=1—beforehand impractical on midrange GPUs.

Conclusion

NVIDIA’s Nemotron Nano 2 launch is a crucial second for open LLM analysis: it redefines what’s potential on a single cost-effective GPU—each in pace and context capability—whereas elevating the bar for knowledge transparency and reproducibility. Its hybrid structure, throughput supremacy, and high-quality open datasets are set to speed up innovation throughout the AI ecosystem.


Take a look at the Technical Particulars, Paper and Fashions on Hugging Face. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s traits at present: learn extra, subscribe to our publication, and change into a part of the NextTech neighborhood at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

The best way to Construct an Autonomous Machine Studying Analysis Loop in Google Colab Utilizing Andrej Karpathy’s AutoResearch Framework for Hyperparameter Discovery and Experiment Monitoring

March 13, 2026

Stanford Researchers Launch OpenJarvis: A Native-First Framework for Constructing On-Machine Private AI Brokers with Instruments, Reminiscence, and Studying

March 12, 2026

Find out how to Design a Streaming Determination Agent with Partial Reasoning, On-line Replanning, and Reactive Mid-Execution Adaptation in Dynamic Environments

March 12, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

LightWheel Raises $145 Million, Creating the World’s First Embodied Knowledge Unicorn

By NextTechMarch 13, 2026

Lately, LightWheel accomplished $145 million in A++ and A+++ funding rounds. This spherical launched a…

Samsung Galaxy S26 Extremely Turns Your Pocket Right into a Full Workstation

March 13, 2026

Alphamab Oncology Declares IND Utility for Modern EGFR/HER3 Twin Payload Bispecific ADC JSKN021 was Formally Accepted by CDE

March 13, 2026
Top Trending

LightWheel Raises $145 Million, Creating the World’s First Embodied Knowledge Unicorn

By NextTechMarch 13, 2026

Lately, LightWheel accomplished $145 million in A++ and A+++ funding rounds. This…

Samsung Galaxy S26 Extremely Turns Your Pocket Right into a Full Workstation

By NextTechMarch 13, 2026

Samsung has geared up the Galaxy S26 Extremely with {hardware} able to…

Alphamab Oncology Declares IND Utility for Modern EGFR/HER3 Twin Payload Bispecific ADC JSKN021 was Formally Accepted by CDE

By NextTechMarch 13, 2026

SUZHOU, China, March 13, 2026 /PRNewswire/ — Alphamab Oncology (inventory code: 9966.HK) introduced…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!