Overcoming Decoder Inconsistencies in Whisper for Dravidian and Low-Resource Languages
Researchers have identified decoder inconsistencies in the Whisper ASR model that lead to higher word error rates for Dravidian and other low-resource languages. They found that these languages have longer words, greater vocabulary diversity, and less repetition, causing sparse token distributions and substitution errors. To address this, the paper proposes two decoder enhancements: Weighted-Attention to balance linguistic and acoustic cues, and Self-Conditioning to improve token consistency by reinjecting intermediate predictions. These methods demonstrated reduced word error rates for agglutinative and low-resource languages. AI
IMPACT Introduces specific techniques to improve ASR performance for underrepresented languages, potentially broadening access to AI speech technologies.