PulseAugur
EN
LIVE 08:18:33

Voice AI systems fail to act on detected emotions, research finds

A new research paper evaluating four leading real-time voice AI systems from OpenAI, Google, and Alibaba reveals a significant "emotional intelligence gap." These systems can often perceive emotions like distress or sarcasm in a caller's voice but fail to act on this information, instead prioritizing the literal words spoken. This disconnect was observed across scenarios involving crying callers, frightened voices authorizing transfers, and sarcastic agreement, indicating that current voice AI often processes speech as a transcript rather than a holistic communication. While explicit prompting to attend to vocal delivery shows some improvement, it is inconsistent, suggesting caution is needed when deploying these systems in contexts where tone and emotion are critical. AI

IMPACT Current voice AI systems may misinterpret critical emotional cues, necessitating caution in sensitive applications.

RANK_REASON Academic paper evaluating existing AI systems. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Voice AI systems fail to act on detected emotions, research finds

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Martijn Bartelds, Federico Bianchi, James Zou ·

    Real-Time Voice AI Hears but Does Not Listen

    arXiv:2606.26083v1 Announce Type: new Abstract: Speech conveys information through both words and vocal delivery. We evaluate four leading production realtime voice systems-OpenAI's GPT Realtime 2, Google's Gemini 3.1 Flash Live, and Alibaba's Qwen3.5 Omni Plus and Omni Flash-on …

  2. arXiv cs.CL TIER_1 English(EN) · James Zou ·

    Real-Time Voice AI Hears but Does Not Listen

    Speech conveys information through both words and vocal delivery. We evaluate four leading production realtime voice systems-OpenAI's GPT Realtime 2, Google's Gemini 3.1 Flash Live, and Alibaba's Qwen3.5 Omni Plus and Omni Flash-on tasks where the words and the delivery patterns …