Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Handbook Processes Are Placing Nationwide Safety at Danger

February 25, 2026

Sub-Second Volumetric 3D Printing | Hackaday

February 25, 2026

AI is driving Five9 outcomes, this analyst says

February 25, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Handbook Processes Are Placing Nationwide Safety at Danger
  • Sub-Second Volumetric 3D Printing | Hackaday
  • AI is driving Five9 outcomes, this analyst says
  • Deliveroo will wind down & exit Singapore after Mar 4
  • David Chima’s journey from Aba stalls to B2B e-commerce
  • What TMG’s Newest Mega-Developments Imply for Egypt’s Property Market
  • Trinity’s maritime monitoring Sea-Scan crew wins Defence Innovation Problem
  • Annapurna Finance Raises USD 100 Million by way of Syndicated Multi-Foreign money Social Mortgage Facility
Wednesday, February 25
NextTech NewsNextTech News
Home - AI & Machine Learning - Liquid AI’s New LFM2-24B-A2B Hybrid Structure Blends Consideration with Convolutions to Clear up the Scaling Bottlenecks of Trendy LLMs
AI & Machine Learning

Liquid AI’s New LFM2-24B-A2B Hybrid Structure Blends Consideration with Convolutions to Clear up the Scaling Bottlenecks of Trendy LLMs

NextTechBy NextTechFebruary 25, 2026No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Liquid AI’s New LFM2-24B-A2B Hybrid Structure Blends Consideration with Convolutions to Clear up the Scaling Bottlenecks of Trendy LLMs
Share
Facebook Twitter LinkedIn Pinterest Email


The generative AI race has lengthy been a sport of ‘larger is healthier.’ However because the trade hits the bounds of energy consumption and reminiscence bottlenecks, the dialog is shifting from uncooked parameter counts to architectural effectivity. Liquid AI crew is main this cost with the discharge of LFM2-24B-A2B, a 24-billion parameter mannequin that redefines what we should always count on from edge-capable AI.

Screenshot 2026 02 25 at 12.30.37 AM 1
https://www.liquid.ai/weblog/lfm2-24b-a2b

The ‘A2B’ Structure: A 1:3 Ratio for Effectivity

The ‘A2B’ within the mannequin’s title stands for Consideration-to-Base. In a standard Transformer, each layer makes use of Softmax Consideration, which scales quadratically (O(N2)) with sequence size. This results in huge KV (Key-Worth) caches that devour VRAM.

Liquid AI crew bypasses this by utilizing a hybrid construction. The ‘Base‘ layers are environment friendly gated quick convolution blocks, whereas the ‘Consideration‘ layers make the most of Grouped Question Consideration (GQA).

Within the LFM2-24B-A2B configuration, the mannequin makes use of a 1:3 ratio:

  • Whole Layers: 40
  • Convolution Blocks: 30
  • Consideration Blocks: 10

By interspersing a small variety of GQA blocks with a majority of gated convolution layers, the mannequin retains the high-resolution retrieval and reasoning of a Transformer whereas sustaining the quick prefill and low reminiscence footprint of a linear-complexity mannequin.

Sparse MoE: 24B Intelligence on a 2B Funds

A very powerful factor of LFM2-24B-A2B is its Combination of Specialists (MoE) design. Whereas the mannequin incorporates 24 billion parameters, it solely prompts 2.3 billion parameters per token.

This can be a game-changer for deployment. As a result of the energetic parameter path is so lean, the mannequin can match into 32GB of RAM. This implies it might run domestically on high-end shopper laptops, desktops with built-in GPUs (iGPUs), and devoted NPUs without having a data-center-grade A100. It successfully gives the information density of a 24B mannequin with the inference velocity and vitality effectivity of a 2B mannequin.

Screenshot 2026 02 25 at 12.29.51 AM 1Screenshot 2026 02 25 at 12.29.51 AM 1
https://www.liquid.ai/weblog/lfm2-24b-a2b

Benchmarks: Punching Up

Liquid AI crew reviews that the LFM2 household follows a predictable, log-linear scaling conduct. Regardless of its smaller energetic parameter rely, the 24B-A2B mannequin constantly outperforms bigger rivals.

  • Logic and Reasoning: In exams like GSM8K and MATH-500, it rivals dense fashions twice its dimension.
  • Throughput: When benchmarked on a single NVIDIA H100 utilizing vLLM, it reached 26.8K whole tokens per second at 1,024 concurrent requests, considerably outpacing Snowflake’s gpt-oss-20b and Qwen3-30B-A3B.
  • Lengthy Context: The mannequin contains a 32k token context window, optimized for privacy-sensitive RAG (Retrieval-Augmented Technology) pipelines and native doc evaluation.

Technical Cheat Sheet

Property Specification
Whole Parameters 24 Billion
Energetic Parameters 2.3 Billion
Structure Hybrid (Gated Conv + GQA)
Layers 40 (30 Base / 10 Consideration)
Context Size 32,768 Tokens
Coaching Knowledge 17 Trillion Tokens
License LFM Open License v1.0
Native Help llama.cpp, vLLM, SGLang, MLX

Key Takeaways

  • Hybrid ‘A2B’ Structure: The mannequin makes use of a 1:3 ratio of Grouped Question Consideration (GQA) to Gated Brief Convolutions. By using linear-complexity ‘Base’ layers for 30 out of 40 layers, the mannequin achieves a lot sooner prefill and decode speeds with a considerably decreased reminiscence footprint in comparison with conventional all-attention Transformers.
  • Sparse MoE Effectivity: Regardless of having 24 billion whole parameters, the mannequin solely prompts 2.3 billion parameters per token. This ‘Sparse Combination of Specialists’ design permits it to ship the reasoning depth of a big mannequin whereas sustaining the inference latency and vitality effectivity of a 2B-parameter mannequin.
  • True Edge Functionality: Optimized through hardware-in-the-loop structure search, the mannequin is designed to slot in 32GB of RAM. This makes it totally deployable on consumer-grade {hardware}, together with laptops with built-in GPUs and NPUs, with out requiring costly data-center infrastructure.
  • State-of-the-Artwork Efficiency: LFM2-24B-A2B outperforms bigger rivals like Qwen3-30B-A3B and Snowflake gpt-oss-20b in throughput. Benchmarks present it hits roughly 26.8K tokens per second on a single H100, displaying near-linear scaling and excessive effectivity in long-context duties as much as its 32k token window.

Take a look at the Technical particulars and Mannequin weights. Additionally, be happy to comply with us on Twitter and don’t neglect to affix our 120k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you may be part of us on telegram as nicely.


NVIDIA 1

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s developments in the present day: learn extra, subscribe to our e-newsletter, and turn into a part of the NextTech neighborhood at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

How AI Will Influence the PropTech Business in 2026

February 25, 2026

A Coding Implementation to Simulate Sensible Byzantine Fault Tolerance with Asyncio, Malicious Nodes, and Latency Evaluation

February 25, 2026

Meta AI Open Sources GCM for Higher GPU Cluster Monitoring to Guarantee Excessive Efficiency AI Coaching and {Hardware} Reliability

February 25, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

Handbook Processes Are Placing Nationwide Safety at Danger

By NextTechFebruary 25, 2026

Why automating delicate knowledge transfers is now a mission-critical precedence Greater than half of nationwide…

Sub-Second Volumetric 3D Printing | Hackaday

February 25, 2026

AI is driving Five9 outcomes, this analyst says

February 25, 2026
Top Trending

Handbook Processes Are Placing Nationwide Safety at Danger

By NextTechFebruary 25, 2026

Why automating delicate knowledge transfers is now a mission-critical precedence Greater than…

Sub-Second Volumetric 3D Printing | Hackaday

By NextTechFebruary 25, 2026

One of many extra promising 3D printing applied sciences that hasn’t fairly…

AI is driving Five9 outcomes, this analyst says

By NextTechFebruary 25, 2026

Roth Capital Markets analyst Richard Baldry is reiterating his “Purchase” ranking and…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!