NAVER LABS Europe has submitted a system to the IWSLT 2026 instruction-following speech processing short track, achieving a joint first-place ranking. Their approach involves a multi-stage training pipeline that incorporates SpeechMapper for learning a speech-to-LLM embedding projector using only ASR data. Additionally, they developed a synthetic dataset called fakACL, generated using SeamlessM4T-large-v2, to improve performance on scientific presentation tasks. This updated system surpasses last year's best performance while being more compact and utilizing a less powerful LLM backbone. AI
IMPACT This research advances speech processing capabilities by integrating LLMs and novel projection methods, potentially improving cross-lingual speech translation and understanding tasks.
RANK_REASON Submission to an academic track at a conference with a detailed paper. [lever_c_demoted from research: ic=1 ai=1.0]
- arXiv
- fakACL
- Hugging Face
- IWSLT 2026
- Marcely Zanon Boito
- NAVER LABS Europe
- SeamlessM4T-large-v2
- SpeechMapper
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →