Google DeepMind enhances Gemini audio models for natural voice interactions and translation

By PulseAugur Editorial · [2 sources] · 2025-06-03 17:15

Google DeepMind has released upgraded Gemini 2.5 audio models, enhancing capabilities for both live voice agents and text-to-speech generation. The Gemini 2.5 Flash Native Audio model now offers improved function calling, instruction following, and conversational context awareness, achieving a 71.5% score on the ComplexFuncBench Audio benchmark. Additionally, new live speech translation features are rolling out in the Google Translate app, enabling real-time speech-to-speech translation that preserves speaker intonation and pitch. AI

RANK_REASON Frontier-lab model release with system card.

Read on Google DeepMind →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Google DeepMind enhances Gemini audio models for natural voice interactions and translation

COVERAGE [2]

Google DeepMind TIER_1 English(EN) · 2025-12-12 17:50

Improved Gemini audio models for powerful voice experiences
Google DeepMind TIER_1 English(EN) · 2025-06-03 17:15

Advanced audio dialog and generation with Gemini 2.5

Gemini 2.5 has new capabilities in AI-powered audio dialog and generation.

COVERAGE [2]

Improved Gemini audio models for powerful voice experiences

Advanced audio dialog and generation with Gemini 2.5

RELATED ENTITIES

RELATED TOPICS