Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

This analyst simply raised his worth goal on Village Farms

November 12, 2025

Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day

November 12, 2025

J&T strikes 80M parcels a day—how did it grow to be a courier powerhouse?

November 12, 2025
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • This analyst simply raised his worth goal on Village Farms
  • Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day
  • J&T strikes 80M parcels a day—how did it grow to be a courier powerhouse?
  • 27 scientists in Eire on Extremely Cited Researchers listing
  • A Community Chief Powering India’s Digital Future
  • Tremendous Mario Galaxy Film will get first trailer, new casting particulars
  • Honasa widens premium play with oral magnificence wager, says fast commerce drives 10% of complete income
  • This American hashish inventory is likely one of the greatest, analyst says
Wednesday, November 12
NextTech NewsNextTech News
Home - AI & Machine Learning - Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU
AI & Machine Learning

Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU

NextTechBy NextTechNovember 12, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU
Share
Facebook Twitter LinkedIn Pinterest Email


Maya Analysis has launched Maya1, a 3B parameter textual content to speech mannequin that turns textual content plus a brief description into controllable, expressive speech whereas operating in actual time on a single GPU.

What Maya1 Really Does?

Maya1 is a state-of-the-art speech mannequin for expressive voice technology. It’s constructed to seize actual human emotion and exact voice design from textual content inputs.

The core interface has 2 inputs:

  1. A pure language voice description, for instance ‘Feminine voice in her 20s with a British accent, energetic, clear diction” or “Demon character, male voice, low pitch, gravelly timbre, gradual pacing’.
  2. The textual content that ought to be spoken

The mannequin combines each indicators and generates audio that matches the content material and the described model. You may as well insert inline emotion tags contained in the textual content, reminiscent of , , , , , , and greater than 20 feelings.

Maya1 outputs 24 kHz mono audio and helps actual time streaming, which makes it appropriate for assistants, interactive brokers, video games, podcasts and reside content material.

The Maya Analysis staff claims that the mannequin outperforms prime proprietary techniques whereas remaining absolutely open supply underneath the Apache 2.0 license.

Structure and SNAC Codec

Maya1 is a 3B parameter decoder solely transformer with a Llama model spine. As a substitute of predicting uncooked waveforms, it predicts tokens from a neural audio codec named SNAC.

The technology circulate is

textual content → tokenize → generate SNAC codes (7 tokens per body) → decode → 24 kHz audio

SNAC makes use of a multi scale hierarchical construction at about 12, 23 and 47 Hz. This retains the autoregressive sequence compact whereas preserving element. The codec is designed for actual time streaming at about 0.98 kbps.

The essential level is that the transformer operates on discrete codec tokens as a substitute of uncooked samples. A separate SNAC decoder, for instance hubertsiuzdak/snac_24khz, reconstructs the waveform. This separation makes technology extra environment friendly and simpler to scale than direct waveform prediction.

Coaching Knowledge And Voice Conditioning

Maya1 is pretrained on an web scale English speech corpus to be taught broad acoustic protection and pure coarticulation. It’s then high quality tuned on a curated proprietary dataset of studio recordings that embody human verified voice descriptions, greater than 20 emotion tags per pattern, a number of English accents, and character or position variations.

The documented knowledge pipeline consists of:

  1. 24 kHz mono resampling with about minus 23 LUFS loudness
  2. Voice exercise detection with silence trimming between 1 and 14 seconds
  3. Compelled alignment utilizing Montreal Compelled Aligner for phrase boundaries
  4. MinHash LSH textual content deduplication
  5. Chromaprint primarily based audio deduplication
  6. SNAC encoding with 7 token body packing

The Maya Analysis staff evaluated a number of methods to situation the mannequin on a voice description. Easy colon codecs and key worth tag codecs both induced the mannequin to talk the outline or didn’t generalize nicely. The most effective performing format makes use of an XML model attribute wrapper that encodes the outline and textual content in a pure manner whereas remaining sturdy.

In follow, this implies builders can describe voices in free type textual content, near how they’d transient a voice actor, as a substitute of studying a customized parameter schema.

Screenshot 2025 11 11 at 1.47.21 PM 1
https://huggingface.co/maya-research/maya1

Inference And Deployment On A Single GPU

The reference Python script on Hugging Face masses the mannequin with AutoModelForCausalLM.from_pretrained("maya-research/maya1", torch_dtype=torch.bfloat16, device_map="auto") and makes use of the SNAC decoder from SNAC.from_pretrained("hubertsiuzdak/snac_24khz").

The Maya Analysis staff recommends a single GPU with 16 GB or extra of VRAM, for instance A100, H100 or a shopper RTX 4090 class card.

For manufacturing, they supply a vllm_streaming_inference.py script that integrates with vLLM. It helps Automated Prefix Caching for repeated voice descriptions, a WebAudio ring buffer, multi GPU scaling and sub 100 millisecond latency targets for actual time use.

Past the core repository, they’ve launched:

  • A Hugging Face House that exposes an interactive browser demo the place customers enter textual content and voice descriptions and take heed to output
  • GGUF quantized variants of Maya1 for lighter deployments utilizing llama.cpp
  • A ComfyUI node that wraps Maya1 as a single node, with emotion tag helpers and SNAC integration

These tasks reuse the official mannequin weights and interface, in order that they keep in line with the principle implementation.

Key Takeaways

  1. Maya1 is a 3B parameter, decoder solely, Llama model textual content to speech mannequin that predicts SNAC neural codec tokens as a substitute of uncooked waveforms, and outputs 24 kHz mono audio with streaming assist.
  2. The mannequin takes 2 inputs, a pure language voice description and the goal textual content, and helps greater than 20 inline emotion tags reminiscent of , , and for native management of expressiveness.
  3. Maya1 is educated with a pipeline that mixes giant scale English pretraining and studio high quality high quality tuning with loudness normalization, voice exercise detection, pressured alignment, textual content deduplication, audio deduplication and SNAC encoding.
  4. The reference implementation runs on a single 16 GB plus GPU utilizing torch_dtype=torch.bfloat16, integrates with a SNAC decoder, and has a vLLM primarily based streaming server with Automated Prefix Caching for low latency deployment.
  5. Maya1 is launched underneath the Apache 2.0 license, with official weights, Hugging Face House demo, GGUF quantized variants and ComfyUI integration, which makes expressive, emotion wealthy, controllable textual content to speech accessible for business and native use.

Maya1 pushes open supply textual content to speech into territory that was beforehand dominated by proprietary APIs. A 3B parameter Llama model decoder that predicts SNAC codec tokens, runs on a single 16 GB GPU with vLLM streaming and Automated Prefix Caching, and exposes greater than 20 inline feelings with pure language voice design, is a sensible constructing block for actual time brokers, video games and instruments. General, Maya1 reveals that expressive, controllable TTS will be each open and manufacturing prepared.


Try the Mannequin Weights and Demo. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be a part of us on telegram as nicely.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

🙌 Comply with MARKTECHPOST: Add us as a most popular supply on Google.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s developments at the moment: learn extra, subscribe to our e-newsletter, and grow to be a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

Methods to Cut back Price and Latency of Your RAG Software Utilizing Semantic LLM Caching

November 12, 2025

Baidu Releases ERNIE-4.5-VL-28B-A3B-Considering: An Open-Supply and Compact Multimodal Reasoning Mannequin Beneath the ERNIE-4.5 Household

November 12, 2025

Construct an Finish-to-Finish Interactive Analytics Dashboard Utilizing PyGWalker Options for Insightful Information Exploration

November 12, 2025
Add A Comment
Leave A Reply Cancel Reply

Economy News

This analyst simply raised his worth goal on Village Farms

By NextTechNovember 12, 2025

Village Farms’ breakout second quarter wasn’t a one-off, in keeping with Beacon Securities analyst Doug…

Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day

November 12, 2025

J&T strikes 80M parcels a day—how did it grow to be a courier powerhouse?

November 12, 2025
Top Trending

This analyst simply raised his worth goal on Village Farms

By NextTechNovember 12, 2025

Village Farms’ breakout second quarter wasn’t a one-off, in keeping with Beacon…

Uzbek Ambassador in Abu Dhabi Hosts Reception to Mark Nationwide Day

By NextTechNovember 12, 2025

His Excellency Suhail Mohamed Al Mazrouei, UAE Minister of Vitality and Infrastructure,…

J&T strikes 80M parcels a day—how did it grow to be a courier powerhouse?

By NextTechNovember 12, 2025

Based by Oppo’s creators, J&T Categorical is now the main categorical supply…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!