Hugging Face has developed a benchmark to evaluate how well automatic speech recognition (ASR) systems handle code-switched speech, where individuals switch between languages mid-sentence. This is crucial for voice agents serving bilingual customer bases. The benchmark, covering language pairs like Spanish-English and French-English, uses HR and IT service management scenarios. Top-performing models identified include ElevenLabs Scribe V2, Gemini 3 Flash, and Assembly AI Universal 3-Pro, with results reported using Word Error Rate (WER), Semantic Word Error Rate (SWER), and Answer Error Rate (AER). AI
IMPACT Sets a new standard for evaluating voice agents in multilingual enterprise environments, potentially driving improvements in ASR for global customer service.
RANK_REASON The cluster describes a new benchmark and dataset for evaluating ASR systems on code-switched speech, along with performance results for several models.
- Assembly AI Universal 3-Pro
- ElevenLabs Multilingual V2
- ElevenLabs Scribe V2
- Gemini 3 Flash
- GPT-5
- Hugging Face
- OpenAI
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →