Close Menu
  • Home
  • Opinion
  • Region
    • Africa
    • Asia
    • Europe
    • Middle East
    • North America
    • Oceania
    • South America
  • AI & Machine Learning
  • Robotics & Automation
  • Space & Deep Tech
  • Web3 & Digital Economies
  • Climate & Sustainability Tech
  • Biotech & Future Health
  • Mobility & Smart Cities
  • Global Tech Pulse
  • Cybersecurity & Digital Rights
  • Future of Work & Education
  • Trend Radar & Startup Watch
  • Creator Economy & Culture
What's Hot

Visible Devices’ Phantom Clear Monitor Would possibly Assist You See By the Hype

November 10, 2025

Korea Hyperlinks Startup Coverage to Employee Welfare: New ESG Normal for Innovation – KoreaTechDesk

November 10, 2025

Circus CA-1 Could Change into the Robotic That Cooks Your Lunch and Sends the Employees Residence

November 10, 2025
Facebook X (Twitter) Instagram LinkedIn RSS
NextTech NewsNextTech News
Facebook X (Twitter) Instagram LinkedIn RSS
  • Home
  • Africa
  • Asia
  • Europe
  • Middle East
  • North America
  • Oceania
  • South America
  • Opinion
Trending
  • Visible Devices’ Phantom Clear Monitor Would possibly Assist You See By the Hype
  • Korea Hyperlinks Startup Coverage to Employee Welfare: New ESG Normal for Innovation – KoreaTechDesk
  • Circus CA-1 Could Change into the Robotic That Cooks Your Lunch and Sends the Employees Residence
  • AI Interview Sequence #1: Clarify Some LLM Textual content Technology Methods Utilized in LLMs
  • 4 cities to obtain funding increase for brand spanking new local weather initiatives
  • The important position of girls in robotics and the inaugural Ladies in Robotics Gala
  • Dwarf Galaxies Might Maintain the Solutions to the Debate on Darkish Matter
  • Weekly funding round-up! The entire European startup funding rounds we tracked this week (Nov. 03-07)
Monday, November 10
NextTech NewsNextTech News
Home - AI & Machine Learning - NVIDIA Simply Launched Audio Flamingo 3: An Open-Supply Mannequin Advancing Audio Basic Intelligence
AI & Machine Learning

NVIDIA Simply Launched Audio Flamingo 3: An Open-Supply Mannequin Advancing Audio Basic Intelligence

NextTechBy NextTechJuly 16, 2025No Comments4 Mins Read
Share Facebook Twitter Pinterest LinkedIn Tumblr Telegram Email Copy Link
Follow Us
Google News Flipboard
NVIDIA Simply Launched Audio Flamingo 3: An Open-Supply Mannequin Advancing Audio Basic Intelligence
Share
Facebook Twitter LinkedIn Pinterest Email


Heard about Synthetic Basic Intelligence (AGI)? Meet its auditory counterpart—Audio Basic Intelligence. With Audio Flamingo 3 (AF3), NVIDIA introduces a serious leap in how machines perceive and motive about sound. Whereas previous fashions might transcribe speech or classify audio clips, they lacked the power to interpret audio in a context-rich, human-like means—throughout speech, ambient sound, and music, and over prolonged durations. AF3 adjustments that.

With Audio Flamingo 3, NVIDIA introduces a completely open-source massive audio-language mannequin (LALM) that not solely hears but additionally understands and causes. Constructed on a five-stage curriculum and powered by the AF-Whisper encoder, AF3 helps lengthy audio inputs (as much as 10 minutes), multi-turn multi-audio chat, on-demand pondering, and even voice-to-voice interactions. This units a brand new bar for the way AI methods work together with sound, bringing us a step nearer to AGI.

Screenshot 2025 07 15 at 9.04.28 PM 1Screenshot 2025 07 15 at 9.04.28 PM 1

The Core Improvements Behind Audio Flamingo 3

  1. AF-Whisper: A Unified Audio Encoder AF3 makes use of AF-Whisper, a novel encoder tailored from Whisper-v3. It processes speech, ambient sounds, and music utilizing the identical structure—fixing a serious limitation of earlier LALMs which used separate encoders, resulting in inconsistencies. AF-Whisper leverages audio-caption datasets, synthesized metadata, and a dense 1280-dimension embedding area to align with textual content representations.
  2. Chain-of-Thought for Audio: On-Demand Reasoning Not like static QA methods, AF3 is supplied with ‘pondering’ capabilities. Utilizing the AF-Assume dataset (250k examples), the mannequin can carry out chain-of-thought reasoning when prompted, enabling it to elucidate its inference steps earlier than arriving at a solution—a key step towards clear audio AI.
  3. Multi-Flip, Multi-Audio Conversations By the AF-Chat dataset (75k dialogues), AF3 can maintain contextual conversations involving a number of audio inputs throughout turns. This mimics real-world interactions, the place people refer again to earlier audio cues. It additionally introduces voice-to-voice conversations utilizing a streaming text-to-speech module.
  4. Lengthy Audio Reasoning AF3 is the primary totally open mannequin able to reasoning over audio inputs as much as 10 minutes. Educated with LongAudio-XL (1.25M examples), the mannequin helps duties like assembly summarization, podcast understanding, sarcasm detection, and temporal grounding.
Screenshot 2025 07 15 at 9.05.05 PM 1Screenshot 2025 07 15 at 9.05.05 PM 1

State-of-the-Artwork Benchmarks and Actual-World Functionality

AF3 surpasses each open and closed fashions on over 20 benchmarks, together with:

  • MMAU (avg): 73.14% (+2.14% over Qwen2.5-O)
  • LongAudioBench: 68.6 (GPT-4o analysis), beating Gemini 2.5 Professional
  • LibriSpeech (ASR): 1.57% WER, outperforming Phi-4-mm
  • ClothoAQA: 91.1% (vs. 89.2% from Qwen2.5-O)

These enhancements aren’t simply marginal; they redefine what’s anticipated from audio-language methods. AF3 additionally introduces benchmarking in voice chat and speech era, attaining 5.94s era latency (vs. 14.62s for Qwen2.5) and higher similarity scores.

The Knowledge Pipeline: Datasets That Train Audio Reasoning

NVIDIA didn’t simply scale compute—they rethought the information:

  • AudioSkills-XL: 8M examples combining ambient, music, and speech reasoning.
  • LongAudio-XL: Covers long-form speech from audiobooks, podcasts, conferences.
  • AF-Assume: Promotes quick CoT-style inference.
  • AF-Chat: Designed for multi-turn, multi-audio conversations.

Every dataset is totally open-sourced, together with coaching code and recipes, enabling reproducibility and future analysis.

Open Supply

AF3 is not only a mannequin drop. NVIDIA launched:

  • Mannequin weights
  • Coaching recipes
  • Inference code
  • 4 open datasets

This transparency makes AF3 essentially the most accessible state-of-the-art audio-language mannequin. It opens new analysis instructions in auditory reasoning, low-latency audio brokers, music comprehension, and multi-modal interplay.

Conclusion: Towards Basic Audio Intelligence

Audio Flamingo 3 demonstrates that deep audio understanding is not only potential however reproducible and open. By combining scale, novel coaching methods, and numerous information, NVIDIA delivers a mannequin that listens, understands, and causes in methods earlier LALMs couldn’t.


Try the Paper, Codes and Mannequin on Hugging Face. All credit score for this analysis goes to the researchers of this mission.

Prepared to attach with 1 Million+ AI Devs/Engineers/Researchers? See how NVIDIA, LG AI Analysis, and high AI corporations leverage MarkTechPost to succeed in their target market [Learn More]


Screen Shot 2021 09 14 at 9.02.24 AM

Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is dedicated to harnessing the potential of Synthetic Intelligence for social good. His most up-to-date endeavor is the launch of an Synthetic Intelligence Media Platform, Marktechpost, which stands out for its in-depth protection of machine studying and deep studying information that’s each technically sound and simply comprehensible by a large viewers. The platform boasts of over 2 million month-to-month views, illustrating its reputation amongst audiences.

Elevate your perspective with NextTech Information, the place innovation meets perception.
Uncover the newest breakthroughs, get unique updates, and join with a world community of future-focused thinkers.
Unlock tomorrow’s developments at the moment: learn extra, subscribe to our publication, and grow to be a part of the NextTech group at NextTech-news.com

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
NextTech
  • Website

Related Posts

AI Interview Sequence #1: Clarify Some LLM Textual content Technology Methods Utilized in LLMs

November 10, 2025

StepFun AI Releases Step-Audio-EditX: A New Open-Supply 3B LLM-Grade Audio Enhancing Mannequin Excelling at Expressive and Iterative Audio Enhancing

November 9, 2025

How you can Construct an Agentic Voice AI Assistant that Understands, Causes, Plans, and Responds via Autonomous Multi-Step Intelligence

November 9, 2025
Add A Comment
Leave A Reply Cancel Reply

Economy News

Visible Devices’ Phantom Clear Monitor Would possibly Assist You See By the Hype

By NextTechNovember 10, 2025

Visible Devices has simply unveiled the Phantom, a 24-inch monitor that permits you to see…

Korea Hyperlinks Startup Coverage to Employee Welfare: New ESG Normal for Innovation – KoreaTechDesk

November 10, 2025

Circus CA-1 Could Change into the Robotic That Cooks Your Lunch and Sends the Employees Residence

November 10, 2025
Top Trending

Visible Devices’ Phantom Clear Monitor Would possibly Assist You See By the Hype

By NextTechNovember 10, 2025

Visible Devices has simply unveiled the Phantom, a 24-inch monitor that permits…

Korea Hyperlinks Startup Coverage to Employee Welfare: New ESG Normal for Innovation – KoreaTechDesk

By NextTechNovember 10, 2025

Korea’s startup coverage is getting into a brand new part the place…

Circus CA-1 Could Change into the Robotic That Cooks Your Lunch and Sends the Employees Residence

By NextTechNovember 10, 2025

Circus SE out of Munich constructed a robotic referred to as the…

Subscribe to News

Get the latest sports news from NewsSite about world, sports and politics.

NEXTTECH-LOGO
Facebook X (Twitter) Instagram YouTube

AI & Machine Learning

Robotics & Automation

Space & Deep Tech

Web3 & Digital Economies

Climate & Sustainability Tech

Biotech & Future Health

Mobility & Smart Cities

Global Tech Pulse

Cybersecurity & Digital Rights

Future of Work & Education

Creator Economy & Culture

Trend Radar & Startup Watch

News By Region

Africa

Asia

Europe

Middle East

North America

Oceania

South America

2025 © NextTech-News. All Rights Reserved
  • About Us
  • Contact Us
  • Privacy Policy
  • Terms Of Service
  • Advertise With Us
  • Write For Us
  • Submit Article & Press Release

Type above and press Enter to search. Press Esc to cancel.

Subscribe For Latest Updates

Sign up to best of Tech news, informed analysis and opinions on what matters to you.

Invalid email address
 We respect your inbox and never send spam. You can unsubscribe from our newsletter at any time.     
Thanks for subscribing!