Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Submersible for US army makes use of hydrogen to cost batteries

February 15, 2026

YouTube monetization replace: What creators have to know as ‘AI slop’ overwhelms the platform

February 15, 2026

Ant Group Open-Sources Trillion-Parameter Reasoning Mannequin Ring-2.5-1T

February 15, 2026
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Submersible for US army makes use of hydrogen to cost batteries
  • YouTube monetization replace: What creators have to know as ‘AI slop’ overwhelms the platform
  • Ant Group Open-Sources Trillion-Parameter Reasoning Mannequin Ring-2.5-1T
  • In Korea’s Gaming Sector, New Titles Resolve Survival — and International Markets Resolve Scale – KoreaTechDesk
  • Hackers get hacked, as BreachForums database is leaked
  • NASA Launches Crew-12 Staff to ISS on SpaceX Falcon 9 and Dragon Capsule
  • India spending on local weather motion is 5.6% of GDP: FM
  • Meet ‘Kani-TTS-2’: A 400M Param Open Supply Textual content-to-Speech Mannequin that Runs in 3GB VRAM with Voice Cloning Assist
Sunday, February 15
NextTech NewsNextTech News
Home - AI & Machine Learning - Meet ‘Kani-TTS-2’: A 400M Param Open Supply Textual content-to-Speech Mannequin that Runs in 3GB VRAM with Voice Cloning Assist
AI & Machine Learning

Meet ‘Kani-TTS-2’: A 400M Param Open Supply Textual content-to-Speech Mannequin that Runs in 3GB VRAM with Voice Cloning Assist

NextTechBy NextTechFebruary 15, 2026No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Meet ‘Kani-TTS-2’: A 400M Param Open Supply Textual content-to-Speech Mannequin that Runs in 3GB VRAM with Voice Cloning Assist
Share
Facebook Twitter LinkedIn Pinterest Email






The panorama of generative audio is shifting towards effectivity. A brand new open-source contender, Kani-TTS-2, has been launched by the group at nineninesix.ai. This mannequin marks a departure from heavy, compute-expensive TTS techniques. As a substitute, it treats audio as a language, delivering high-fidelity speech synthesis with a remarkably small footprint.

Kani-TTS-2 provides a lean, high-performance different to closed-source APIs. It’s at present obtainable on Hugging Face in each English (EN) and Portuguese (PT) variations.

The Structure: LFM2 and NanoCodec

Kani-TTS-2 follows the ‘Audio-as-Language‘ philosophy. The mannequin doesn’t use conventional mel-spectrogram pipelines. As a substitute, it converts uncooked audio into discrete tokens utilizing a neural codec.

The system depends on a two-stage course of:

  1. The Language Spine: The mannequin is constructed on LiquidAI’s LFM2 (350M) structure. This spine generates ‘audio intent’ by predicting the subsequent audio tokens. As a result of LFM (Liquid Basis Fashions) are designed for effectivity, they supply a sooner different to straightforward transformers.
  2. The Neural Codec: It makes use of the NVIDIA NanoCodec to show these tokens into 22kHz waveforms.

Through the use of this structure, the mannequin captures human-like prosody—the rhythm and intonation of speech—with out the ‘robotic’ artifacts present in older TTS techniques.

Effectivity: 10,000 Hours in 6 Hours

The coaching metrics for Kani-TTS-2 are a masterclass in optimization. The English mannequin was educated on 10,000 hours of high-quality speech information.

Whereas that scale is spectacular, the pace of coaching is the true story. The analysis group educated the mannequin in solely 6 hours utilizing a cluster of 8 NVIDIA H100 GPUs. This proves that huge datasets now not require weeks of compute time when paired with environment friendly architectures like LFM2.

Zero-Shot Voice Cloning and Efficiency

The standout function for builders is zero-shot voice cloning. In contrast to conventional fashions that require fine-tuning for brand spanking new voices, Kani-TTS-2 makes use of speaker embeddings.

  • The way it works: You present a brief reference audio clip.
  • The consequence: The mannequin extracts the distinctive traits of that voice and applies them to the generated textual content immediately.

From a deployment perspective, the mannequin is extremely accessible:

  • Parameter Depend: 400M (0.4B) parameters.
  • Pace: It encompasses a Actual-Time Issue (RTF) of 0.2. This implies it might probably generate 10 seconds of speech in roughly 2 seconds.
  • {Hardware}: It requires solely 3GB of VRAM, making it appropriate with consumer-grade GPUs just like the RTX 3060 or 4050.
  • License: Launched below the Apache 2.0 license, permitting for industrial use.

Key Takeaways

  • Environment friendly Structure: The mannequin makes use of a 400M parameter spine based mostly on LiquidAI’s LFM2 (350M). This ‘Audio-as-Language’ method treats speech as discrete tokens, permitting for sooner processing and extra human-like intonation in comparison with conventional architectures.
  • Fast Coaching at Scale: Kani-TTS-2-EN was educated on 10,000 hours of high-quality speech information in simply 6 hours utilizing 8 NVIDIA H100 GPUs.
  • Immediate Zero-Shot Cloning: There isn’t a want for fine-tuning to duplicate a selected voice. By offering a brief reference audio clip, the mannequin makes use of speaker embeddings to immediately synthesize textual content within the goal speaker’s voice.
  • Excessive Efficiency on Edge {Hardware}: With a Actual-Time Issue (RTF) of 0.2, the mannequin can generate 10 seconds of audio in roughly 2 seconds. It requires solely 3GB of VRAM, making it totally practical on consumer-grade GPUs just like the RTX 3060.
  • Developer-Pleasant Licensing: Launched below the Apache 2.0 license, Kani-TTS-2 is prepared for industrial integration. It provides a local-first, low-latency different to costly closed-source TTS APIs.

Take a look at the Mannequin Weight. Additionally, be at liberty to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be part of us on telegram as properly.


Michal Sutter is a knowledge science skilled with a Grasp of Science in Knowledge Science from the College of Padova. With a stable basis in statistical evaluation, machine studying, and information engineering, Michal excels at reworking complicated datasets into actionable insights.

NVIDIA 1






Earlier articleGetting Began with OpenClaw and Connecting It with WhatsApp


Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a worldwide community of future-focused thinkers.
Unlock tomorrow’s traits right this moment: learn extra, subscribe to our publication, and turn out to be a part of the NextTech neighborhood at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

Getting Began with OpenClaw and Connecting It with WhatsApp

February 15, 2026

Google AI Introduces the WebMCP to Allow Direct and Structured Web site Interactions for New AI Brokers

February 15, 2026

Construct a Self-Organizing Agent Reminiscence System for Lengthy-Time period AI Reasoning 

February 14, 2026
Add A Comment
Leave A Reply Cancel Reply

Economy News

Submersible for US army makes use of hydrogen to cost batteries

By NextTechFebruary 15, 2026

LampreyMMAUV can connect itself to the hulls of ships or submarines, and recharge its personal…

YouTube monetization replace: What creators have to know as ‘AI slop’ overwhelms the platform

February 15, 2026

Ant Group Open-Sources Trillion-Parameter Reasoning Mannequin Ring-2.5-1T

February 15, 2026
Top Trending

Submersible for US army makes use of hydrogen to cost batteries

By NextTechFebruary 15, 2026

LampreyMMAUV can connect itself to the hulls of ships or submarines, and…

YouTube monetization replace: What creators have to know as ‘AI slop’ overwhelms the platform

By NextTechFebruary 15, 2026

As Google proprietor Alphabet invests closely in AI, YouTube is discouraging “mass-produced”…

Ant Group Open-Sources Trillion-Parameter Reasoning Mannequin Ring-2.5-1T

By NextTechFebruary 15, 2026

Additionally on February 13, Ant Group introduced the open-source launch of Ring-2.5-1T,…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!