Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

How 8BitDo’s Final Wi-fi Controller with Charging Dock Turbocharges Your Gaming Setup

December 16, 2025

Authorities of Goa explores collaboration with Starlink to speed up digital connectivity within the State

December 16, 2025

Good Metropolis Highlight: John Main – Software program Developer

December 16, 2025
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • How 8BitDo’s Final Wi-fi Controller with Charging Dock Turbocharges Your Gaming Setup
  • Authorities of Goa explores collaboration with Starlink to speed up digital connectivity within the State
  • Good Metropolis Highlight: John Main – Software program Developer
  • Google to close down darkish net monitoring service
  • AMD and MassRobotics Announce Winners of the AMD Robotics Innovation Problem
  • Google’s search chief rejects this technique for licensing information content material amid AI scramble
  • UPI frauds peak in FY24, present indicators of decline: Parliament information
  • Why Steadiness Issues in Each Physique and Thoughts
Tuesday, December 16
NextTech NewsNextTech News
Home - AI & Machine Learning - Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU
AI & Machine Learning

Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU

NextTechBy NextTechNovember 12, 2025No Comments6 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
Maya1: A New Open Supply 3B Voice Mannequin For Expressive Textual content To Speech On A Single GPU
Share
Facebook Twitter LinkedIn Pinterest Email


Maya Analysis has launched Maya1, a 3B parameter textual content to speech mannequin that turns textual content plus a brief description into controllable, expressive speech whereas operating in actual time on a single GPU.

What Maya1 Really Does?

Maya1 is a state-of-the-art speech mannequin for expressive voice technology. It’s constructed to seize actual human emotion and exact voice design from textual content inputs.

The core interface has 2 inputs:

  1. A pure language voice description, for instance ‘Feminine voice in her 20s with a British accent, energetic, clear diction” or “Demon character, male voice, low pitch, gravelly timbre, gradual pacing’.
  2. The textual content that ought to be spoken

The mannequin combines each indicators and generates audio that matches the content material and the described model. You may as well insert inline emotion tags contained in the textual content, reminiscent of , , , , , , and greater than 20 feelings.

Maya1 outputs 24 kHz mono audio and helps actual time streaming, which makes it appropriate for assistants, interactive brokers, video games, podcasts and reside content material.

The Maya Analysis staff claims that the mannequin outperforms prime proprietary techniques whereas remaining absolutely open supply underneath the Apache 2.0 license.

Structure and SNAC Codec

Maya1 is a 3B parameter decoder solely transformer with a Llama model spine. As a substitute of predicting uncooked waveforms, it predicts tokens from a neural audio codec named SNAC.

The technology circulate is

textual content → tokenize → generate SNAC codes (7 tokens per body) → decode → 24 kHz audio

SNAC makes use of a multi scale hierarchical construction at about 12, 23 and 47 Hz. This retains the autoregressive sequence compact whereas preserving element. The codec is designed for actual time streaming at about 0.98 kbps.

The essential level is that the transformer operates on discrete codec tokens as a substitute of uncooked samples. A separate SNAC decoder, for instance hubertsiuzdak/snac_24khz, reconstructs the waveform. This separation makes technology extra environment friendly and simpler to scale than direct waveform prediction.

Coaching Knowledge And Voice Conditioning

Maya1 is pretrained on an web scale English speech corpus to be taught broad acoustic protection and pure coarticulation. It’s then high quality tuned on a curated proprietary dataset of studio recordings that embody human verified voice descriptions, greater than 20 emotion tags per pattern, a number of English accents, and character or position variations.

The documented knowledge pipeline consists of:

  1. 24 kHz mono resampling with about minus 23 LUFS loudness
  2. Voice exercise detection with silence trimming between 1 and 14 seconds
  3. Compelled alignment utilizing Montreal Compelled Aligner for phrase boundaries
  4. MinHash LSH textual content deduplication
  5. Chromaprint primarily based audio deduplication
  6. SNAC encoding with 7 token body packing

The Maya Analysis staff evaluated a number of methods to situation the mannequin on a voice description. Easy colon codecs and key worth tag codecs both induced the mannequin to talk the outline or didn’t generalize nicely. The most effective performing format makes use of an XML model attribute wrapper that encodes the outline and textual content in a pure manner whereas remaining sturdy.

In follow, this implies builders can describe voices in free type textual content, near how they’d transient a voice actor, as a substitute of studying a customized parameter schema.

Screenshot 2025 11 11 at 1.47.21 PM 1
https://huggingface.co/maya-research/maya1

Inference And Deployment On A Single GPU

The reference Python script on Hugging Face masses the mannequin with AutoModelForCausalLM.from_pretrained("maya-research/maya1", torch_dtype=torch.bfloat16, device_map="auto") and makes use of the SNAC decoder from SNAC.from_pretrained("hubertsiuzdak/snac_24khz").

The Maya Analysis staff recommends a single GPU with 16 GB or extra of VRAM, for instance A100, H100 or a shopper RTX 4090 class card.

For manufacturing, they supply a vllm_streaming_inference.py script that integrates with vLLM. It helps Automated Prefix Caching for repeated voice descriptions, a WebAudio ring buffer, multi GPU scaling and sub 100 millisecond latency targets for actual time use.

Past the core repository, they’ve launched:

  • A Hugging Face House that exposes an interactive browser demo the place customers enter textual content and voice descriptions and take heed to output
  • GGUF quantized variants of Maya1 for lighter deployments utilizing llama.cpp
  • A ComfyUI node that wraps Maya1 as a single node, with emotion tag helpers and SNAC integration

These tasks reuse the official mannequin weights and interface, in order that they keep in line with the principle implementation.

Key Takeaways

  1. Maya1 is a 3B parameter, decoder solely, Llama model textual content to speech mannequin that predicts SNAC neural codec tokens as a substitute of uncooked waveforms, and outputs 24 kHz mono audio with streaming assist.
  2. The mannequin takes 2 inputs, a pure language voice description and the goal textual content, and helps greater than 20 inline emotion tags reminiscent of , , and for native management of expressiveness.
  3. Maya1 is educated with a pipeline that mixes giant scale English pretraining and studio high quality high quality tuning with loudness normalization, voice exercise detection, pressured alignment, textual content deduplication, audio deduplication and SNAC encoding.
  4. The reference implementation runs on a single 16 GB plus GPU utilizing torch_dtype=torch.bfloat16, integrates with a SNAC decoder, and has a vLLM primarily based streaming server with Automated Prefix Caching for low latency deployment.
  5. Maya1 is launched underneath the Apache 2.0 license, with official weights, Hugging Face House demo, GGUF quantized variants and ComfyUI integration, which makes expressive, emotion wealthy, controllable textual content to speech accessible for business and native use.

Maya1 pushes open supply textual content to speech into territory that was beforehand dominated by proprietary APIs. A 3B parameter Llama model decoder that predicts SNAC codec tokens, runs on a single 16 GB GPU with vLLM streaming and Automated Prefix Caching, and exposes greater than 20 inline feelings with pure language voice design, is a sensible constructing block for actual time brokers, video games and instruments. General, Maya1 reveals that expressive, controllable TTS will be each open and manufacturing prepared.


Try the Mannequin Weights and Demo. Be at liberty to take a look at our GitHub Web page for Tutorials, Codes and Notebooks. Additionally, be at liberty to observe us on Twitter and don’t neglect to affix our 100k+ ML SubReddit and Subscribe to our E-newsletter. Wait! are you on telegram? now you may be a part of us on telegram as nicely.


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its recognition amongst audiences.

🙌 Comply with MARKTECHPOST: Add us as a most popular supply on Google.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s developments at the moment: learn extra, subscribe to our e-newsletter, and grow to be a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

The best way to Design a Gemini-Powered Self-Correcting Multi-Agent AI System with Semantic Routing, Symbolic Guardrails, and Reflexive Orchestration

December 15, 2025

OpenAI has Launched the ‘circuit-sparsity’: A Set of Open Instruments for Connecting Weight Sparse Fashions and Dense Baselines by way of Activation Bridges

December 14, 2025

5 AI Mannequin Architectures Each AI Engineer Ought to Know

December 13, 2025
Add A Comment
Leave A Reply Cancel Reply

Economy News

How 8BitDo’s Final Wi-fi Controller with Charging Dock Turbocharges Your Gaming Setup

By NextTechDecember 16, 2025

Cell gaming has come a great distance, however the truth that contact controls are nonetheless…

Authorities of Goa explores collaboration with Starlink to speed up digital connectivity within the State

December 16, 2025

Good Metropolis Highlight: John Main – Software program Developer

December 16, 2025
Top Trending

How 8BitDo’s Final Wi-fi Controller with Charging Dock Turbocharges Your Gaming Setup

By NextTechDecember 16, 2025

Cell gaming has come a great distance, however the truth that contact…

Authorities of Goa explores collaboration with Starlink to speed up digital connectivity within the State

By NextTechDecember 16, 2025

Panaji, December 15, 2025: The Authorities of Goa, led by Hon’ble Minister…

Good Metropolis Highlight: John Main – Software program Developer

By NextTechDecember 16, 2025

For this month’s Good Metropolis Highlight, we spoke with John Main, a…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!