Alibaba has simply taken a big step ahead by introducing a line of small AI fashions that may match completely into your present smartphone. Qwen 3.5’s smaller cousins pack a punch that used to take a full server configuration to ship, and even the smallest of them will work flawlessly in your iPhone 17.
The brand new Qwen 3.5 by @Alibaba_Qwen operating on-device on iPhone 17 Professional.
Qwen 3.5 beats fashions 4 instances its measurement, has sturdy visible understanding, and may toggle reasoning on or off.
The 2B 6-bit mannequin right here is operating with MLX optimized for Apple Silicon. pic.twitter.com/GsGGzur0og
— Adrien Grondin (@adrgrondin) March 2, 2026
Alibaba’s Qwen crew launched 4 new fashions immediately: 0.8 billion, 2 billion, 4 billion, and 9 billion parameter variations. Constructed on the identical upgraded structure as the larger Qwen 3.5 sequence, which launched in February 2026, these smaller variations have been constructed for effectivity and may natively deal with each textual content and graphics. The 0.8B and 2B variations are perfect for telephones, laptops, and edge {hardware} the place reminiscence and battery life are vital, whereas the 4B is designed for light-weight duties, and the 9B mannequin approaches the capabilities of bigger fashions when it comes to reasoning, math, multilingual information, and doc evaluation.

Plaud Word Professional AI Voice Recorder, Transcribe & Summarize with AI, App Management, Word Taker for Conferences…
- AI-POWERED TRANSCRIPTION & MULTI-DIMENSIONAL SUMMARIES: Plaud Word Professional is your skilled voice transcriber, delivering high-accuracy transcription…
- ENHANCED CONTEXT WITH MULTIMODAL INPUT: Seize audio, kind notes, add pictures, and press to focus on key moments for richer context. Throughout…
- CHAT WITH YOUR RECORDINGS USING “ASK Plaud”: Unlock deeper insights with this interactive AI. Ask questions, extract key factors, draft emails, and get…
The outcomes are fairly mind-blowing, as Alibaba claims that the 9B variant produces outcomes which are virtually on par with methods with 120 billion parameters, implying that it matches (and in some instances outperforms) the capabilities of huge hitters like ChatGPT and Gemini in a sequence of vital checks. In distinction, the 4B variation performs at ranges corresponding to prior 80B fashions. They might have sacrificed depth in favor of pace and useful resource financial system, however they will nonetheless carry out primary image recognition and textual content.
Group testing has demonstrated that these fashions might be run on a cellular system utilizing instruments equivalent to MLX, with some even managing to suit a quantized model of the 2B mannequin into an iPhone 17. You obtain lightning-fast responses with out having to go surfing or pay for subscription charges or knowledge switch to servers. The 0.8B and 2B variants are offline-ready and appropriate with regular telephones, and customers say that the 4B mannequin delivers practically as a lot energy in real-world use because the bigger fashions. Plus, with open-source weights on Hugging Face and ModelScope, deployment is sort of easy utilizing acquainted frameworks. Yow will discover all of them right here.
Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s tendencies immediately: learn extra, subscribe to our e-newsletter, and develop into a part of the NextTech group at NextTech-news.com

