PulseAugur
EN
LIVE 14:06:13

Whisper ASR model improved for low-resource languages

Researchers have identified decoder inconsistencies in the Whisper ASR model that lead to higher word error rates for Dravidian and other low-resource languages. They found that these languages have longer words, greater vocabulary diversity, and less repetition, causing sparse token distributions and substitution errors. To address this, the paper proposes two decoder enhancements: Weighted-Attention to balance linguistic and acoustic cues, and Self-Conditioning to improve token consistency by reinjecting intermediate predictions. These methods demonstrated reduced word error rates for agglutinative and low-resource languages. AI

IMPACT Introduces specific techniques to improve ASR performance for underrepresented languages, potentially broadening access to AI speech technologies.

RANK_REASON Academic paper detailing technical improvements to an existing model. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 English(EN) · Pankaj Wasnik ·

    Overcoming Decoder Inconsistencies in Whisper for Dravidian and Low-Resource Languages

    Multilingual ASR models such as Whisper perform well on high-resource languages but exhibit substantially higher Word Error Rates (WER) for Dravidian languages compared to Indo-Aryan ones. Through linguistic and dataset analysis, we show that Dravidian languages have longer words…