Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Honasa widens premium play with oral magnificence wager, says fast commerce drives 10% of complete income

November 12, 2025

This American hashish inventory is likely one of the greatest, analyst says

November 12, 2025

Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU

November 12, 2025
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Honasa widens premium play with oral magnificence wager, says fast commerce drives 10% of complete income
  • This American hashish inventory is likely one of the greatest, analyst says
  • Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU
  • Date, time, and what to anticipate
  • Extra Northern Lights anticipated after 2025’s strongest photo voltaic flare
  • Apple’s iPhone 18 lineup might get a big overhaul- Particulars
  • MTN, Airtel dominate Nigeria’s ₦7.67 trillion telecom market in 2024
  • Leakers declare subsequent Professional iPhone will lose two-tone design
Wednesday, November 12
NextTech NewsNextTech News
Home - AI & Machine Learning - Liquid AI’s LFM2-VL-3B Brings a 3B Parameter Imaginative and prescient Language Mannequin (VLM) to Edge-Class Gadgets
AI & Machine Learning

Liquid AI’s LFM2-VL-3B Brings a 3B Parameter Imaginative and prescient Language Mannequin (VLM) to Edge-Class Gadgets

NextTechBy NextTechOctober 25, 2025No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Liquid AI’s LFM2-VL-3B Brings a 3B Parameter Imaginative and prescient Language Mannequin (VLM) to Edge-Class Gadgets
Share
Facebook Twitter LinkedIn Pinterest Email


Liquid AI launched LFM2-VL-3B, a 3B parameter imaginative and prescient language mannequin for picture textual content to textual content duties. It extends the LFM2-VL household past the 450M and 1.6B variants. The mannequin targets increased accuracy whereas preserving the pace profile of the LFM2 structure. It’s out there on LEAP and Hugging Face below the LFM Open License v1.0.

Mannequin overview and interface

LFM2-VL-3B accepts interleaved picture and textual content inputs and produces textual content outputs. The mannequin exposes a ChatML like template. The processor inserts an sentinel that’s changed with encoded picture tokens at run time. The default textual content context size is 32,768 tokens. These particulars assist devs reproduce evaluations and combine the mannequin with current multimodal pipelines.

Screenshot 2025 10 24 at 1.45.41 PM 1
https://www.liquid.ai/weblog/lfm2-vl-3b-a-new-efficient-vision-language-for-the-edge

Structure

The stack pairs a language tower with a form conscious imaginative and prescient tower and a projector. The language tower is LFM2-2.6B, a hybrid convolution plus consideration spine. The imaginative and prescient tower is SigLIP2 NaFlex at 400M parameters, it preserves native side ratios and avoids distortion. The connector is a 2 layer MLP with pixel unshuffle, it compresses picture tokens earlier than fusion with the language area. This design lets customers cap imaginative and prescient token budgets with out retraining the mannequin.

The encoder processes native resolutions as much as 512×512. Bigger inputs are cut up into non overlapping 512×512 patches. A thumbnail pathway offers international context throughout tiling. The environment friendly token mapping is documented with concrete examples, a 256×384 picture maps to 96 tokens, a 1000×3000 picture maps to 1,020 tokens. The mannequin card exposes person controls for minimal and most picture tokens and the tiling change. These controls tune pace and high quality at inference time.

Inference settings

The Hugging Face mannequin card offers beneficial parameters. Textual content era makes use of temperature 0.1, min p 0.15, and a repetition penalty of 1.05. Imaginative and prescient settings use min picture tokens 64, max picture tokens 256, and picture splitting enabled. The processor applies the chat template and the picture sentinel robotically. The instance makes use of AutoModelForImageTextToText and AutoProcessor with bfloat16 precision.

How is it skilled?

Liquid AI describes a staged strategy. The group performs joint mid coaching that adjusts the textual content to picture ratio over time. The mannequin then undergoes supervised fantastic tuning centered on picture understanding. The info sources are massive scale open datasets plus in home artificial imaginative and prescient information for activity protection.

Benchmarks

The analysis group experiences aggressive outcomes amongst light-weight open VLMs. On MM-IFEval the mannequin reaches 51.83. On RealWorldQA it reaches 71.37. On MMBench dev en it reaches 79.81. The POPE rating is 89.01. The desk notes that scores for different programs have been computed with VLMEvalKit. The desk excludes Qwen3-VL-2B as a result of that system was launched sooner or later earlier.

Screenshot 2025 10 24 at 1.49.55 PM 1Screenshot 2025 10 24 at 1.49.55 PM 1
https://www.liquid.ai/weblog/lfm2-vl-3b-a-new-efficient-vision-language-for-the-edge

The language functionality stays near the LFM2-2.6B spine. The analysis group cites 30 p.c on GPQA and 63 p.c on MMLU. This issues when notion duties embody data queries. The group additionally states expanded multilingual visible understanding throughout English, Japanese, French, Spanish, German, Italian, Portuguese, Arabic, Chinese language, and Korean.

Why edge customers ought to care?

The structure retains compute and reminiscence inside small gadget budgets. Picture tokens are compressible and person constrained, so throughput is predictable. SigLIP2 400M NaFlex encoder preserves side ratios, which helps fantastic grained notion. The projector reduces tokens on the connector, which improves tokens per second. The analysis group additionally revealed a GGUF construct for on gadget runtimes. These properties are helpful for robotics, cell, and industrial purchasers that want native processing and strict information boundaries.

Key Takeaways

  1. Compact multimodal stack: 3B parameter LFM2-VL-3B pairs an LFM2-2.6B language tower with a 400M SigLIP2 NaFlex imaginative and prescient encoder and a 2-layer MLP projector for image-token fusion. NaFlex preserves native side ratios.
  2. Decision dealing with and token budgets: Pictures run natively as much as 512×512, bigger inputs tile into non overlapping 512×512 patches with a thumbnail pathway for international context. Documented token mappings embody 256×384 → 96 tokens and 1000×3000 → 1,020 tokens.
  3. Inference interface: ChatML-like prompting with an sentinel, default textual content context 32,768 tokens, beneficial decoding settings, and processor-level controls for picture splitting allow reproducible analysis and straightforward integration in multimodal pipelines.
  4. Measured efficiency: Reported outcomes embody MM-IFEval 51.83, RealWorldQA 71.37, MMBench-dev-en 79.81, and POPE 89.01. Language-only indicators from the spine are about 30% GPQA and 63% MMLU, helpful for blended notion plus data workloads.

LFM2-VL-3B is a sensible step for edge multimodal workloads, the 3B stack pairs LFM2-2.6B with a 400M SigLIP2 NaFlex encoder and an environment friendly projector, which lowers picture token counts for predictable latency. Native decision processing with 512 by 512 tiling and token caps provides deterministic budgets. Reported scores on MM-IFEval, RealWorldQA, MMBench, and POPE are aggressive for this measurement. Open weights, a GGUF construct, and LEAP entry cut back integration friction. Total, that is an edge prepared VLM launch with clear controls and clear benchmarks.


Take a look at the Mannequin on HF and Technical particulars. Be happy to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be happy to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our Publication. Wait! are you on telegram? now you’ll be able to be a part of us on telegram as properly.


a professional linkedin headshot photogr 0jcmb0R9Sv6nW5XK zkPHw uARV5VW1ST6osLNlunoVWg

Michal Sutter is a knowledge science skilled with a Grasp of Science in Information Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at remodeling complicated datasets into actionable insights.

🙌 Comply with MARKTECHPOST: Add us as a most popular supply on Google.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s traits at present: learn extra, subscribe to our publication, and develop into a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU

November 12, 2025

Methods to Cut back Price and Latency of Your RAG Software Utilizing Semantic LLM Caching

November 12, 2025

Baidu Releases ERNIE-4.5-VL-28B-A3B-Considering: An Open-Supply and Compact Multimodal Reasoning Mannequin Beneath the ERNIE-4.5 Household

November 12, 2025
Add A Comment
Leave A Reply Cancel Reply

Economy News

Honasa widens premium play with oral magnificence wager, says fast commerce drives 10% of complete income

By NextTechNovember 12, 2025

Honasa Client, the guardian of non-public care manufacturers Mamaearth and The Derma Co, stated fast…

This American hashish inventory is likely one of the greatest, analyst says

November 12, 2025

Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU

November 12, 2025
Top Trending

Honasa widens premium play with oral magnificence wager, says fast commerce drives 10% of complete income

By NextTechNovember 12, 2025

Honasa Client, the guardian of non-public care manufacturers Mamaearth and The Derma…

This American hashish inventory is likely one of the greatest, analyst says

By NextTechNovember 12, 2025

Haywood’s Neal Gilmer stated Inexperienced Thumb’s diversified product portfolio and disciplined price…

Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU

By NextTechNovember 12, 2025

Maya Analysis has launched Maya1, a 3B parameter textual content to speech…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!