English(EN) Hearing to Translate: The Effectiveness of Speech Modality Integration into LLMs

SpeechLLMs在翻译基准测试中表现参差不齐，级联系统仍占主导地位

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-28 04:00

一项新的综合测试套件“Hearing to Translate”已被开发出来，用于评估将语音模态直接集成到大型语言模型（LLMs）中进行语音到文本翻译的有效性。该研究将六个最先进的SpeechLLMs与十六个级联系统进行了基准测试，分析了在16个基准、13种语言对和9种挑战性条件下的性能。研究结果表明，虽然级联系统总体上仍然最可靠，但最近的SpeechLLMs在特定场景下可以与之匹敌甚至超越它们，而独立的Speech Foundation Models（SFMs）则普遍落后。 AI

影响 SpeechLLMs的新基准测试可能会加速对更高效、更准确的语音翻译系统的研究。

排序理由这是一篇介绍用于评估SpeechLLMs的新基准套件的研究论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Sara Papi, Javier Garcia Gilabert, Zachary Hopton, Vil\'em Zouhar, Carlos Escolano, Gerard I. G\'allego, Jorge Iranzo-S\'anchez, Ahrii Kim, Dominik Mach\'a\v{c}ek, Patricia Schmidtova, Maike Z\"ufle · 2026-04-28 04:00

听证会翻译：语音模态集成到LLM中的有效性

arXiv:2512.16378v4 Announce Type: replace Abstract: As Large Language Models (LLMs) expand beyond text, integrating speech as a native modality has given rise to SpeechLLMs, which directly process spoken language and enable speech-to-text translation (ST) and other downstream tas…

报道来源 [1]

听证会翻译：语音模态集成到LLM中的有效性

相关实体

相关话题