New benchmark tackles ASR bias in Indic languages

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-13 06:55

Researchers have developed Vividh-ASR, a new benchmark designed to evaluate automatic speech recognition (ASR) models for Indic languages, specifically Hindi and Malayalam. This benchmark categorizes audio into four tiers of complexity: studio, broadcast, spontaneous, and synthetic noise, aiming to address the "studio-bias" where models perform well on read speech but poorly on spontaneous audio. Their study revealed that specific training strategies, like early large parameter updates and a hard-to-easy curriculum, significantly improve performance, especially for spontaneous speech. They also introduced a parameter-efficient training recipe, Reverse Multi-Stage Fine-Tuning (R-MFT), which allows smaller models to match or surpass larger ones. AI

影响 Addresses ASR model bias in low-resource languages, potentially improving performance for spontaneous speech and enabling more efficient model training.

排序理由 The cluster contains an academic paper introducing a new benchmark and training methodology for ASR models.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Kumarmanas Nethil · 2026-05-13 06:55

Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

Fine-tuning multilingual ASR models like Whisper for low-resource languages often improves read speech but degrades spontaneous audio performance, a phenomenon we term studio-bias. To diagnose this mismatch, we introduce Vividh-ASR, a complexity-stratified benchmark for Hindi and…
dev.to — LLM tag TIER_1 English(EN) · Nilofer 🚀 · 2026-05-15 19:53

ASR Evaluation Framework: Benchmarking Speech Recognition Models Across Accuracy, Speed, and Robustness

<p>Picking an ASR model for production is not straightforward. Whisper might be the most accurate for general English but too slow for real-time use. Wav2Vec2 might be fast enough for edge devices but struggle with accented speech. Distil-Whisper might hit the sweet spot for your…

报道来源 [2]

Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

ASR Evaluation Framework: Benchmarking Speech Recognition Models Across Accuracy, Speed, and Robustness

相关实体

相关话题