Alibaba Open-Sources Qwen3-TTS Mannequin Suite, Delivering Multilingual, Extremely-Low-Latency Speech Technology

January 22, 2026 — Alibaba’s Qwen crew has formally open-sourced the complete Qwen3-TTS text-to-speech mannequin household, that includes multi-codebook speech technology fashions in two sizes: 1.7B parameters for optimum efficiency and 0.6B parameters optimized for a stability between high quality and effectivity. The fashions are actually accessible on GitHub, ModelScope, and different platforms, with dwell entry supported by way of the Qwen API.

Qwen3-TTS presents a complete characteristic set, together with voice cloning, voice creation, human-like speech synthesis, and pure language instruction management. Powered by the self-developed Qwen3-TTS-Tokenizer-12Hz multi-codebook speech encoder, the mannequin preserves wealthy paralinguistic cues and acoustic surroundings particulars, enabling high-fidelity voice reconstruction.

A key innovation is its Twin-Observe modeling structure, which reduces end-to-end synthesis latency to only 97 milliseconds, with the primary audio packet generated after a single character—making it properly suited to real-time conversational functions.

The mannequin helps 10 main languages, together with Chinese language, English, Japanese, and German, in addition to a number of dialects. It may well robotically adapt intonation, rhythm, and emotional expression based mostly on semantic context, whereas displaying sturdy robustness to noisy or imperfect textual content enter. Throughout a number of benchmarks, Qwen3-TTS achieves state-of-the-art efficiency: its voice creation capabilities outperform MiniMax-Voice-Design, its cross-lingual voice cloning surpasses CosyVoice3, and its long-form speech technology achieves phrase error charges as little as 2.36% (Chinese language) and a pair of.81% (English).

By combining multilingual help, ultra-low latency, and excessive audio high quality, Qwen3-TTS offers an environment friendly and scalable answer for international voice interplay and real-time speech functions.

ModelScope: https://www.modelscope.cn/collections/Qwen/Qwen3-TTS
Hugging Face: https://huggingface.co/collections/Qwen/qwen3-tts
GitHub: https://github.com/QwenLM/Qwen3-TTS

Supply: IThome

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the most recent breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s tendencies right now: learn extra, subscribe to our e-newsletter, and develop into a part of the NextTech group at NextTech-news.com

What's Hot

Scientists Measure Mars’s Impact on Earth’s Local weather

New CoinJar Dublin hub creating 30 jobs

How PopWheels helped a meals cart ditch mills for e-bike batteries

Alibaba Open-Sources Qwen3-TTS Mannequin Suite, Delivering Multilingual, Extremely-Low-Latency Speech Technology

FuriosaAI Rejects Huge Tech Path, Builds Unbiased Street to 2027 IPO – KoreaTechDesk

Advertising and marketing, AI, well being: Check your corporation creativity with Version 219 of our weekly quiz!

Seoul Faces Washington’s Strain Over Coupang Probe, however Korea Attracts Line Between Legislation and Commerce – KoreaTechDesk

Scientists Measure Mars’s Impact on Earth’s Local weather

New CoinJar Dublin hub creating 30 jobs

How PopWheels helped a meals cart ditch mills for e-bike batteries

Scientists Measure Mars’s Impact on Earth’s Local weather

New CoinJar Dublin hub creating 30 jobs

How PopWheels helped a meals cart ditch mills for e-bike batteries

What's Hot

Alibaba Open-Sources Qwen3-TTS Mannequin Suite, Delivering Multilingual, Extremely-Low-Latency Speech Technology

Related Posts

Subscribe For Latest Updates