Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Siemens Healthineers opens new €10m Swords R&D web site

January 30, 2026

Robotic Discuss Episode 142 – Collaborative robotic arms, with Mark Grey

January 30, 2026

NASA Warmth Protect Know-how Allows Area Trade Development

January 30, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Siemens Healthineers opens new €10m Swords R&D web site
  • Robotic Discuss Episode 142 – Collaborative robotic arms, with Mark Grey
  • NASA Warmth Protect Know-how Allows Area Trade Development
  • Apple buys Israeli startup Q.ai because the AI race heats up
  • Bitcoin ETFs Shed $817M as BTC Hits 9-Month Low
  • Victoria says they’re prepared for AI adoption with new business-ready mission assertion
  • A Coding Deep Dive into Differentiable Laptop Imaginative and prescient with Kornia Utilizing Geometry Optimization, LoFTR Matching, and GPU Augmentations
  • Politicization runs deeper than ever at FDA, risking long-term impacts
Friday, January 30
NextTech NewsNextTech News
Home - AI & Machine Learning - Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters
AI & Machine Learning

Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters

NextTechBy NextTechJanuary 30, 2026No Comments6 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Microsoft Unveils Maia 200, An FP4 and FP8 Optimized AI Inference Accelerator for Azure Datacenters
Share
Facebook Twitter LinkedIn Pinterest Email


Maia 200 is Microsoft’s new in home AI accelerator designed for inference in Azure datacenters. It targets the price of token era for giant language fashions and different reasoning workloads by combining slim precision compute, a dense on chip reminiscence hierarchy and an Ethernet based mostly scale up cloth.

Why Microsoft constructed a devoted inference chip?

Coaching and inference stress {hardware} in numerous methods. Coaching wants very massive all to all communication and lengthy working jobs. Inference cares about tokens per second, latency and tokens per greenback. Microsoft positions Maia 200 as its most effective inference system, with about 30 % higher efficiency per greenback than the newest {hardware} in its fleet.

Maia 200 is a part of a heterogeneous Azure stack. It is going to serve a number of fashions, together with the newest GPT 5.2 fashions from OpenAI, and can energy workloads in Microsoft Foundry and Microsoft 365 Copilot. The Microsoft Superintelligence staff will use the chip for artificial information era and reinforcement studying to enhance in home fashions.

Core silicon and numeric specs

Every Maia 200 die is fabricated on TSMC’s 3 nanometer course of. The chip integrates greater than 140 billion transistors.

The compute pipeline is constructed round native FP8 and FP4 tensor cores. A single chip delivers greater than 10 petaFLOPS in FP4 and greater than 5 petaFLOPS in FP8, inside a 750W SoC TDP envelope.

Reminiscence is cut up between stacked HBM and on die SRAM. Maia 200 supplies 216 GB of HBM3e with about 7TB per second of bandwidth and 272MB of on die SRAM. The SRAM is organized into tile degree SRAM and cluster degree SRAM and is absolutely software program managed. Compilers and runtimes can place working units explicitly to maintain consideration and GEMM kernels near compute.

Tile based mostly microarchitecture and reminiscence hierarchy

The Maia 200 microarchitecture is hierarchical. The bottom unit is the tile. A tile is the smallest autonomous compute and storage unit on the chip. Every tile features a Tile Tensor Unit for top throughput matrix operations and a Tile Vector Processor as a programmable SIMD engine. Tile SRAM feeds each models and tile DMA engines transfer information out and in of SRAM with out stalling compute. A Tile Management Processor orchestrates the sequence of tensor and DMA work.

A number of tiles kind a cluster. Every cluster exposes a bigger multi banked Cluster SRAM that’s shared throughout tiles in that cluster. Cluster degree DMA engines transfer information between Cluster SRAM and the co packaged HBM stacks. A cluster core coordinates multi tile execution and makes use of redundancy schemes for tiles and SRAM to enhance yield whereas maintaining the identical programming mannequin.

This hierarchy lets the software program stack pin completely different components of the mannequin in numerous tiers. For instance, consideration kernels can maintain Q, Okay, V tensors in tile SRAM, whereas collective communication kernels can stage payloads in cluster SRAM and cut back HBM strain. The design objective is sustained excessive utilization when fashions develop in measurement and sequence size.

On chip information motion and Ethernet scale up cloth

Inference is commonly restricted by information motion, not peak compute. Maia 200 makes use of a customized Community on Chip together with a hierarchy of DMA engines. The Community on Chip spans tiles, clusters, reminiscence controllers and I/O models. It has separate planes for giant tensor visitors and for small management messages. This separation retains synchronization and small outputs from being blocked behind massive transfers.

Past the chip boundary, Maia 200 integrates its personal NIC and an Ethernet based mostly scale up community that runs the AI Transport Layer protocol. The on-die NIC exposes about 1.4 TB per second in every course, or 2.8 TB per second bidirectional bandwidth, and scales to six,144 accelerators in a two tier area.

Inside every tray, 4 Maia accelerators kind a Absolutely Linked Quad. These 4 units have direct non switched hyperlinks to one another. Most tensor parallel visitors stays inside this group, whereas solely lighter collective visitors goes out to switches. This improves latency and reduces swap port depend for typical inference collectives.

Azure system integration and cooling

At system degree, Maia 200 follows the identical rack, energy and mechanical requirements as Azure GPU servers. It helps air cooled and liquid cooled configurations and makes use of a second era closed loop liquid cooling Warmth Exchanger Unit for top density racks. This permits combined deployments of GPUs and Maia accelerators in the identical datacenter footprint.

The accelerator integrates with the Azure management airplane. Firmware administration, well being monitoring and telemetry use the identical workflows as different Azure compute providers. This allows fleet extensive rollouts and upkeep with out disrupting working AI workloads.

Key Takeaways

Listed here are 5 concise, technical takeaways:

  • Inference first design: Maia 200 is Microsoft’s first silicon and system platform constructed just for AI inference, optimized for giant scale token era in trendy reasoning fashions and huge language fashions.
  • Numeric specs and reminiscence hierarchy: The chip is fabricated on TSMCs 3nm, integrates about 140 billion transistors and delivers greater than 10 PFLOPS FP4 and greater than 5 PFLOPS FP8, with 216 GB HBM3e at 7TB per second together with 272 MB on chip SRAM cut up into tile SRAM and cluster SRAM and managed in software program.
  • Efficiency versus different cloud accelerators: Microsoft studies about 30 % higher efficiency per greenback than the newest Azure inference methods and claims 3 occasions FP4 efficiency of third era Amazon Trainium and better FP8 efficiency than Google TPU v7 on the accelerator degree.
  • Tile based mostly structure and Ethernet cloth: Maia 200 organizes compute into tiles and clusters with native SRAM, DMA engines and a Community on Chip, and exposes an built-in NIC with about 1.4 TB per second per course Ethernet bandwidth that scales to six,144 accelerators utilizing Absolutely Linked Quad teams because the native tensor parallel area.


Michal Sutter is a knowledge science skilled with a Grasp of Science in Knowledge Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at reworking complicated datasets into actionable insights.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s tendencies as we speak: learn extra, subscribe to our e-newsletter, and grow to be a part of the NextTech neighborhood at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

A Coding Deep Dive into Differentiable Laptop Imaginative and prescient with Kornia Utilizing Geometry Optimization, LoFTR Matching, and GPU Augmentations

January 30, 2026

DeepSeek AI Releases DeepSeek-OCR 2 with Causal Visible Movement Encoder for Structure Conscious Doc Understanding

January 30, 2026

Reworking the Way forward for Actual Property

January 30, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

Siemens Healthineers opens new €10m Swords R&D web site

By NextTechJanuary 30, 2026

The corporate can also be working a multimillion-euro upskilling programme over the following three to…

Robotic Discuss Episode 142 – Collaborative robotic arms, with Mark Grey

January 30, 2026

NASA Warmth Protect Know-how Allows Area Trade Development

January 30, 2026
Top Trending

Siemens Healthineers opens new €10m Swords R&D web site

By NextTechJanuary 30, 2026

The corporate can also be working a multimillion-euro upskilling programme over the…

Robotic Discuss Episode 142 – Collaborative robotic arms, with Mark Grey

By NextTechJanuary 30, 2026

Claire chatted to Mark Grey from Common Robots about their light-weight robotic…

NASA Warmth Protect Know-how Allows Area Trade Development

By NextTechJanuary 30, 2026

Utilizing cutting-edge materials licensed from NASA, a protecting warmth defend manufactured in-house…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!