Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

NTT DATA Indicators Strategic Collaboration Settlement with AWS to Ship AI-Powered Contact Middle Options

October 7, 2025

Gearing up for the World Robotic Olympiad

October 7, 2025

Strapping 5 heatsinks to your cellphone will enhance efficiency

October 7, 2025
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • NTT DATA Indicators Strategic Collaboration Settlement with AWS to Ship AI-Powered Contact Middle Options
  • Gearing up for the World Robotic Olympiad
  • Strapping 5 heatsinks to your cellphone will enhance efficiency
  • Metropolis of London Company introduces GenAI framework
  • Omnichannel residence & furnishing retailer Wakefit will get SEBI nod for IPO
  • Digital ID can remove scandal of Africa’s ‘invisibles’
  • The HR balancing act because it shifts in direction of empowering others
  • Ladies in robotics it is advisable to find out about 2025
Tuesday, October 7
NextTech NewsNextTech News
Home - AI & Machine Learning - StreamTensor: A PyTorch-to-Accelerator Compiler that Streams LLM Intermediates Throughout FPGA Dataflows
AI & Machine Learning

StreamTensor: A PyTorch-to-Accelerator Compiler that Streams LLM Intermediates Throughout FPGA Dataflows

NextTechBy NextTechOctober 6, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
StreamTensor: A PyTorch-to-Accelerator Compiler that Streams LLM Intermediates Throughout FPGA Dataflows
Share
Facebook Twitter LinkedIn Pinterest Email


Why deal with LLM inference as batched kernels to DRAM when a dataflow compiler can pipe tiles via on-chip FIFOs and stream converters?StreamTensor is a compiler that lowers PyTorch LLM graphs (GPT-2, Llama, Qwen, Gemma) into stream-scheduled dataflow accelerators on AMD’s Alveo U55C FPGA. The system introduces an iterative tensor (“itensor”) sort to encode tile/order of streams, enabling provably appropriate inter-kernel streaming and automatic insertion/sizing of DMA engines, FIFOs, and structure converters. On LLM decoding workloads, the analysis crew experiences as much as 0.64× decrease latency vs. GPUs and as much as 1.99× larger vitality effectivity.

Screenshot 2025 10 05 at 10.18.54 PM 1
https://arxiv.org/pdf/2509.13694

What StreamTensor does?

StreamTensor compiles PyTorch graphs right into a stream-oriented dataflow design in order that intermediate tiles are largely avoids off-chip DRAM round-trips through on-chip streaming and fusion; DMAs are inserted solely when required; they’re forwarded via on-chip FIFOs to downstream kernels. The compiler’s central abstraction—iterative tensors (itensors)—data iteration order, tiling, and structure, which makes inter-kernel stream compatibility specific and drives converter era solely the place wanted. The framework additionally searches hierarchically over tiling, fusion, and useful resource allocation, and makes use of a linear program to dimension FIFOs to keep away from stalls or impasse whereas minimizing on-chip reminiscence.

Screenshot 2025 10 05 at 10.19.15 PM 1Screenshot 2025 10 05 at 10.19.15 PM 1
https://arxiv.org/pdf/2509.13694

What’s really new?

  • Hierarchical DSE. The compiler explores three design areas—(i) tiling/unroll/vectorization/permutation on the Linalg degree, (ii) fusion below reminiscence/useful resource constraints, and (iii) useful resource allocation/stream widths—optimizing for sustained throughput below bandwidth limits.
  • Finish-to-end PyTorch → gadget circulate. Fashions enter through Torch-MLIR, are reworked to MLIR Linalg, after which right into a dataflow IR whose nodes turn into {hardware} kernels with specific streams and host/runtime glue—no guide RTL meeting.
  • iterative tensor (itensor) typing system. A primary-class tensor sort expresses iteration order, tiling, and affine maps. This makes stream order specific, permits protected kernel fusion, and lets the compiler synthesize minimal buffer/format converters when producers/shoppers disagree.
  • Formal FIFO sizing. Inter-kernel buffering is solved with a linear-programming formulation to keep away from stalls/deadlocks whereas minimizing on-chip reminiscence utilization (BRAM/URAM).

Outcomes

Latency: as much as 0.76× vs prior FPGA LLM accelerators and 0.64× vs a GPU baseline on GPT-2; Power effectivity: as much as 1.99× vs A100 on rising LLMs (model-dependent). Platform context: Alveo U55C (HBM2 16 GB, 460 GB/s, PCIe Gen3×16 or twin Gen4×8, 2×QSFP28).

Screenshot 2025 10 05 at 10.14.19 PM 1Screenshot 2025 10 05 at 10.14.19 PM 1
https://arxiv.org/pdf/2509.13694

The helpful contribution here’s a PyTorch→Torch-MLIR→dataflow compiler that emits stream-scheduled kernels and a number/runtime for AMD’s Alveo U55C; the iterative tensor sort plus linear-programming-based FIFO sizing allows protected inter-kernel streaming somewhat than DRAM round-trips. On reported LLM decoding benchmarks throughout GPT-2, Llama, Qwen, and Gemma, the analysis crew present geometric-mean latency as little as 0.64× vs. a GPU baseline and vitality effectivity as much as 1.99×, with scope restricted to decoding workloads. The {hardware} context is evident: Alveo U55C offers 16 GB HBM2 at 460 GB/s with twin QSFP28 and PCIe Gen3×16 or twin Gen4×8, which aligns with the streaming dataflow design.


Try the Paper. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to comply with us on Twitter and don’t neglect to hitch our 100k+ ML SubReddit and Subscribe to our E-newsletter.


a professional linkedin headshot photogr 0jcmb0R9Sv6nW5XK zkPHw uARV5VW1ST6osLNlunoVWg

Michal Sutter is an information science skilled with a Grasp of Science in Information Science from the College of Padova. With a strong basis in statistical evaluation, machine studying, and knowledge engineering, Michal excels at reworking complicated datasets into actionable insights.

🙌 Observe MARKTECHPOST: Add us as a most well-liked supply on Google.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s developments at present: learn extra, subscribe to our publication, and turn into a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

A New Company-Centered Supervision Strategy Scales Software program AI Brokers With Solely 78 Examples

October 6, 2025

HIPAA & GDPR-Prepared Healthcare Information Annotation Companion

October 6, 2025

Agentic Design Methodology: The way to Construct Dependable and Human-Like AI Brokers utilizing Parlant

October 6, 2025
Add A Comment
Leave A Reply Cancel Reply

Economy News

NTT DATA Indicators Strategic Collaboration Settlement with AWS to Ship AI-Powered Contact Middle Options

By NextTechOctober 7, 2025

Mumbai, October 6, 2025 – NTT DATA, a world chief in digital enterprise and know-how companies,…

Gearing up for the World Robotic Olympiad

October 7, 2025

Strapping 5 heatsinks to your cellphone will enhance efficiency

October 7, 2025
Top Trending

NTT DATA Indicators Strategic Collaboration Settlement with AWS to Ship AI-Powered Contact Middle Options

By NextTechOctober 7, 2025

Mumbai, October 6, 2025 – NTT DATA, a world chief in digital enterprise…

Gearing up for the World Robotic Olympiad

By NextTechOctober 7, 2025

A number of organisations, together with Google and the ECA are working…

Strapping 5 heatsinks to your cellphone will enhance efficiency

By NextTechOctober 7, 2025

A Reddit consumer posted final week, showcasing that after strapping a bunch…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!