PulseAugur
EN
LIVE 09:54:10

Google DeepMind enhances Gemini audio models for natural voice interactions and translation

Google DeepMind has released upgraded Gemini 2.5 audio models, enhancing capabilities for both live voice agents and text-to-speech generation. The Gemini 2.5 Flash Native Audio model now offers improved function calling, instruction following, and conversational context awareness, achieving a 71.5% score on the ComplexFuncBench Audio benchmark. Additionally, new live speech translation features are rolling out in the Google Translate app, enabling real-time speech-to-speech translation that preserves speaker intonation and pitch. AI

RANK_REASON Frontier-lab model release with system card.

Read on Google DeepMind →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Google DeepMind enhances Gemini audio models for natural voice interactions and translation

COVERAGE [2]

  1. Google DeepMind TIER_1 English(EN) ·

    Improved Gemini audio models for powerful voice experiences

  2. Google DeepMind TIER_1 English(EN) ·

    Advanced audio dialog and generation with Gemini 2.5

    Gemini 2.5 has new capabilities in AI-powered audio dialog and generation.