Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Consumer Exercise Leak Spurs Naver to Reinforce Information Governance

February 16, 2026

Japan’s Bullet Prepare Quietly Hauls Cargo at 300 km/h with No Passengers or Even Seats

February 16, 2026

Flippers Meet Poké Balls as Stern’s Pokémon Pinball Machine Takes the Silver Ball on a Wild Experience

February 16, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Consumer Exercise Leak Spurs Naver to Reinforce Information Governance
  • Japan’s Bullet Prepare Quietly Hauls Cargo at 300 km/h with No Passengers or Even Seats
  • Flippers Meet Poké Balls as Stern’s Pokémon Pinball Machine Takes the Silver Ball on a Wild Experience
  • Alibaba Qwen Group Releases Qwen3.5-397B MoE Mannequin with 17B Energetic Parameters and 1M Token Context for AI brokers
  • Avaada Basis Revitalizes Authorities Faculty in Kot Village, Strengthening Rural Training in Dadri
  • Unitree’s Robots Delivered a Martial Arts Masterclass on China’s Greatest Stage Through the 2026 Spring Pageant Gala
  • Authorities plans AI, e-book entry for college kids in 57 medical schools
  • What college students must find out about NELFUND 2026 disbursement
Monday, February 16
NextTech NewsNextTech News
Home - AI & Machine Learning - Alibaba Qwen Group Releases Qwen3.5-397B MoE Mannequin with 17B Energetic Parameters and 1M Token Context for AI brokers
AI & Machine Learning

Alibaba Qwen Group Releases Qwen3.5-397B MoE Mannequin with 17B Energetic Parameters and 1M Token Context for AI brokers

NextTechBy NextTechFebruary 16, 2026No Comments5 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Alibaba Qwen Group Releases Qwen3.5-397B MoE Mannequin with 17B Energetic Parameters and 1M Token Context for AI brokers
Share
Facebook Twitter LinkedIn Pinterest Email


Alibaba Cloud simply up to date the open-source panorama. At this time, the Qwen staff launched Qwen3.5, the most recent era of their massive language mannequin (LLM) household. Probably the most highly effective model is Qwen3.5-397B-A17B. This mannequin is a sparse Combination-of-Consultants (MoE) system. It combines large reasoning energy with excessive effectivity.

Qwen3.5 is a local vision-language mannequin. It’s designed particularly for AI brokers. It might see, code, and cause throughout 201 languages.

Screenshot 2026 02 16 at 10.47.42 AM 1
https://qwen.ai/weblog?id=qwen3.5

The Core Structure: 397B Complete, 17B Energetic

The technical specs of Qwen3.5-397B-A17B are spectacular. The mannequin accommodates 397B complete parameters. Nevertheless, it makes use of a sparse MoE design. This implies it solely prompts 17B parameters throughout any single ahead go.

This 17B activation depend is crucial quantity for devs. It permits the mannequin to supply the intelligence of a 400B mannequin. But it surely runs with the pace of a a lot smaller mannequin. The Qwen staff experiences a 8.6x to 19.0x improve in decoding throughput in comparison with earlier generations. This effectivity solves the excessive price of working large-scale AI.

Screenshot 2026 02 16 at 10.46.46 AM 1Screenshot 2026 02 16 at 10.46.46 AM 1
https://qwen.ai/weblog?id=qwen3.5

Environment friendly Hybrid Structure: Gated Delta Networks

Qwen3.5 doesn’t use an ordinary Transformer design. It makes use of an ‘Environment friendly Hybrid Structure.’ Most LLMs rely solely on Consideration mechanisms. These can grow to be gradual with lengthy textual content. Qwen3.5 combines Gated Delta Networks (linear consideration) with Combination-of-Consultants (MoE).

The mannequin consists of 60 layers. The hidden dimension dimension is 4,096. These layers comply with a selected ‘Hidden Format.’ The structure teams layers into units of 4.

  • 3 blocks use Gated DeltaNet-plus-MoE.
  • 1 block makes use of Gated Consideration-plus-MoE.
  • This sample repeats 15 instances to succeed in 60 layers.

Technical particulars embody:

  • Gated DeltaNet: It makes use of 64 linear consideration heads for Values (V). It makes use of 16 heads for Queries and Keys (QK).
  • MoE Construction: The mannequin has 512 complete consultants. Every token prompts 10 routed consultants and 1 shared skilled. This equals 11 energetic consultants per token.
  • Vocabulary: The mannequin makes use of a padded vocabulary of 248,320 tokens.

Native Multimodal Coaching: Early Fusion

Qwen3.5 is a native vision-language mannequin. Many different fashions add imaginative and prescient capabilities later. Qwen3.5 used ‘Early Fusion’ coaching. This implies the mannequin realized from photos and textual content on the identical time.

The coaching used trillions of multimodal tokens. This makes Qwen3.5 higher at visible reasoning than earlier Qwen3-VL variations. It’s extremely able to ‘agentic’ duties. For instance, it may possibly take a look at a UI screenshot and generate the precise HTML and CSS code. It might additionally analyze lengthy movies with second-level accuracy.

The mannequin helps the Mannequin Context Protocol (MCP). It additionally handles complicated function-calling. These options are important for constructing brokers that management apps or browse the online. Within the IFBench take a look at, it scored 76.5. This rating beats many proprietary fashions.

Screenshot 2026 02 16 at 10.48.07 AM 1Screenshot 2026 02 16 at 10.48.07 AM 1
https://qwen.ai/weblog?id=qwen3.5

Fixing the Reminiscence Wall: 1M Context Size

Lengthy-form knowledge processing is a core function of Qwen3.5. The bottom mannequin has a local context window of 262,144 (256K) tokens. The hosted Qwen3.5-Plus model goes even additional. It helps 1M tokens.

Alibaba Qwen staff used a brand new asynchronous Reinforcement Studying (RL) framework for this. It ensures the mannequin stays correct even on the finish of a 1M token doc. For Devs, this implies you’ll be able to feed a complete codebase into one immediate. You don’t all the time want a fancy Retrieval-Augmented Era (RAG) system.

Efficiency and Benchmarks

The mannequin excels in technical fields. It achieved excessive scores on Humanity’s Final Examination (HLE-Verified). This can be a troublesome benchmark for AI data.

  • Coding: It reveals parity with top-tier closed-source fashions.
  • Math: The mannequin makes use of ‘Adaptive Device Use.’ It might write Python code to unravel math issues. It then runs the code to confirm the reply.
  • Languages: It helps 201 totally different languages and dialects. This can be a massive bounce from the 119 languages within the earlier model.

Key Takeaways

  • Hybrid Effectivity (MoE + Gated Delta Networks): Qwen3.5 makes use of a 3:1 ratio of Gated Delta Networks (linear consideration) to straightforward Gated Consideration blocks throughout 60 layers. This hybrid design permits for an 8.6x to 19.0x improve in decoding throughput in comparison with earlier generations.
  • Huge Scale, Low Footprint: The Qwen3.5-397B-A17B options 397B complete parameters however solely prompts 17B per token. You get 400B-class intelligence with the inference pace and reminiscence necessities of a a lot smaller mannequin.
  • Native Multimodal Basis: In contrast to ‘bolted-on’ imaginative and prescient fashions, Qwen3.5 was skilled by way of Early Fusion on trillions of textual content and picture tokens concurrently. This makes it a top-tier visible agent, scoring 76.5 on IFBench for following complicated directions in visible contexts.
  • 1M Token Context: Whereas the bottom mannequin helps a local 256k token context, the hosted Qwen3.5-Plus handles as much as 1M tokens. This large window permits devs to course of whole codebases or 2-hour movies without having complicated RAG pipelines.

Try the Technical particulars, Mannequin Weights and GitHub Repo. Additionally, be at liberty to comply with us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you’ll be able to be part of us on telegram as properly.


NVIDIA 1

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s developments in the present day: learn extra, subscribe to our publication, and grow to be a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

How Expertise Is Reshaping Monetary Technique

February 16, 2026

Google DeepMind Proposes New Framework for Clever AI Delegation to Safe the Rising Agentic Net for Future Economies

February 16, 2026

A Coding Implementation to Design a Stateful Tutor Agent with Lengthy-Time period Reminiscence, Semantic Recall, and Adaptive Apply Technology

February 16, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

Consumer Exercise Leak Spurs Naver to Reinforce Information Governance

By NextTechFebruary 16, 2026

Error uncovered 1000’s of customers’ exercise histories on Information iN, prompting stronger knowledge controls Naver…

Japan’s Bullet Prepare Quietly Hauls Cargo at 300 km/h with No Passengers or Even Seats

February 16, 2026

Flippers Meet Poké Balls as Stern’s Pokémon Pinball Machine Takes the Silver Ball on a Wild Experience

February 16, 2026
Top Trending

Consumer Exercise Leak Spurs Naver to Reinforce Information Governance

By NextTechFebruary 16, 2026

Error uncovered 1000’s of customers’ exercise histories on Information iN, prompting stronger…

Japan’s Bullet Prepare Quietly Hauls Cargo at 300 km/h with No Passengers or Even Seats

By NextTechFebruary 16, 2026

East Japan Railway Co. (JR East) introduces a revolutionary Shinkansen practice that…

Flippers Meet Poké Balls as Stern’s Pokémon Pinball Machine Takes the Silver Ball on a Wild Experience

By NextTechFebruary 16, 2026

Stern Pinball formally launched their newest masterpiece final week, and Pokémon followers…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!