Brief · PulseAugur

RESEARCH · Hugging Face Blog English(EN) · 3d · [2 sources]

Can Voice Agents Handle Bilingual Customers? Benchmarking Frontier ASR on Code-Switched Speech

Hugging Face has developed a benchmark to evaluate how well automatic speech recognition (ASR) systems handle code-switched speech, where individuals switch between languages mid-sentence. This is crucial for voice agents serving bilingual customer bases. The benchmark, covering language pairs like Spanish-English and French-English, uses HR and IT service management scenarios. Top-performing models identified include ElevenLabs Scribe V2, Gemini 3 Flash, and Assembly AI Universal 3-Pro, with results reported using Word Error Rate (WER), Semantic Word Error Rate (SWER), and Answer Error Rate (AER). AI

IMPACT Sets a new standard for evaluating voice agents in multilingual enterprise environments, potentially driving improvements in ASR for global customer service.

OpenAI
Hugging Face
Gemini 3 Flash
GPT-5
ElevenLabs Scribe V2
ElevenLabs Multilingual V2
Assembly AI Universal 3-Pro