New benchmark tackles ASR bias in Indic languages

By PulseAugur Editorial · [2 sources] · 2026-05-13 06:55

Researchers have developed Vividh-ASR, a new benchmark designed to evaluate automatic speech recognition (ASR) models for Indic languages, specifically Hindi and Malayalam. This benchmark categorizes audio into four tiers of complexity: studio, broadcast, spontaneous, and synthetic noise, aiming to address the "studio-bias" where models perform well on read speech but poorly on spontaneous audio. Their study revealed that specific training strategies, like early large parameter updates and a hard-to-easy curriculum, significantly improve performance, especially for spontaneous speech. They also introduced a parameter-efficient training recipe, Reverse Multi-Stage Fine-Tuning (R-MFT), which allows smaller models to match or surpass larger ones. AI

IMPACT Addresses ASR model bias in low-resource languages, potentially improving performance for spontaneous speech and enabling more efficient model training.

RANK_REASON The cluster contains an academic paper introducing a new benchmark and training methodology for ASR models.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.CL TIER_1 English(EN) · Kumarmanas Nethil · 2026-05-13 06:55

Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

Fine-tuning multilingual ASR models like Whisper for low-resource languages often improves read speech but degrades spontaneous audio performance, a phenomenon we term studio-bias. To diagnose this mismatch, we introduce Vividh-ASR, a complexity-stratified benchmark for Hindi and…
dev.to — LLM tag TIER_1 English(EN) · Nilofer 🚀 · 2026-05-15 19:53

ASR Evaluation Framework: Benchmarking Speech Recognition Models Across Accuracy, Speed, and Robustness

<p>Picking an ASR model for production is not straightforward. Whisper might be the most accurate for general English but too slow for real-time use. Wav2Vec2 might be fast enough for edge devices but struggle with accented speech. Distil-Whisper might hit the sweet spot for your…

COVERAGE [2]

Vividh-ASR: A Complexity-Tiered Benchmark and Optimization Dynamics for Robust Indic Speech Recognition

ASR Evaluation Framework: Benchmarking Speech Recognition Models Across Accuracy, Speed, and Robustness

RELATED ENTITIES

RELATED TOPICS