Direct Preference Optimization for English-Mandarin Code-Switching Speech Recognition in Audio LLMs
Researchers have developed a new method using Direct Preference Optimization (DPO) to improve how audio large language models handle speech that switches between English and Mandarin. The models often fail by omitting languages, translating instead of transcribing, or hallucinating content. By training on 100,000 preference pairs, the models learned to preserve the mixed-language content, significantly reducing transcription errors. AI
IMPACT Enhances the accuracy of multilingual speech recognition in LLMs, potentially improving global accessibility and usability.