Enhancing Multilingual LLM-based ASR with Mixture of Experts and Dynamic Downsampling
Researchers have developed a new framework for multilingual automatic speech recognition (ASR) that leverages large language models (LLMs). The proposed system uses a Mixture of Experts (MoE) architecture to enhance cross-lingual performance and a Continuous Integrate-and-Fire (CIF) mechanism for dynamic downsampling and modality alignment. This approach aims to create more accurate and robust LLM-based ASR systems, showing significant improvements over existing models. AI
IMPACT Introduces novel techniques for improving multilingual ASR performance using LLMs, potentially enhancing global accessibility of speech technologies.