Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

Enhancing Multilingual LLM-based ASR with Mixture of Experts and Dynamic Downsampling

Researchers have developed a new framework for multilingual automatic speech recognition (ASR) that leverages large language models (LLMs). The proposed system uses a Mixture of Experts (MoE) architecture to enhance cross-lingual performance and a Continuous Integrate-and-Fire (CIF) mechanism for dynamic downsampling and modality alignment. This approach aims to create more accurate and robust LLM-based ASR systems, showing significant improvements over existing models. AI

IMPACT Introduces novel techniques for improving multilingual ASR performance using LLMs, potentially enhancing global accessibility of speech technologies.

Mixture of Experts
LLM
Continuous Integrate-and-Fire
Large Language Models