PulseAugur
实时 08:09:46

Google DeepMind enhances Gemini audio models for natural voice interactions and translation

Google DeepMind has released upgraded Gemini 2.5 audio models, enhancing capabilities for both live voice agents and text-to-speech generation. The Gemini 2.5 Flash Native Audio model now offers improved function calling, instruction following, and conversational context awareness, achieving a 71.5% score on the ComplexFuncBench Audio benchmark. Additionally, new live speech translation features are rolling out in the Google Translate app, enabling real-time speech-to-speech translation that preserves speaker intonation and pitch. AI

排序理由 Frontier-lab model release with system card.

在 Google DeepMind 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Google DeepMind enhances Gemini audio models for natural voice interactions and translation

报道来源 [2]

  1. Google DeepMind TIER_1 English(EN) ·

    Improved Gemini audio models for powerful voice experiences

  2. Google DeepMind TIER_1 English(EN) ·

    Advanced audio dialog and generation with Gemini 2.5

    Gemini 2.5 has new capabilities in AI-powered audio dialog and generation.