Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German
A new benchmark study evaluated five commercial automatic speech recognition (ASR) systems on code-switching speech, specifically focusing on Arabic, Persian, and German mixed with English. The research introduced a novel pipeline using GPT-4o and Gemini 1.5 Pro to score transcripts, reducing LLM costs by 91% and employing BERTScore as a more reliable metric than traditional Word Error Rate (WER) for certain language pairs. ElevenLabs Scribe v2 emerged as the top performer, achieving the lowest WER and highest BERTScore across all tested language pairs. AI
IMPACT This research highlights the challenges in ASR for code-switching and introduces a more robust evaluation method, potentially guiding future development of multilingual speech technologies.