English(EN) Streaming Speech-to-Text Translation with a SpeechLLM

SpeechLLM 实现实时翻译，延迟仅1-2秒

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-14 12:32

研究人员开发了一种新的SpeechLLM架构，专为实时语音到文本翻译而设计。与处理整个语句或固定间隔输出的先前系统不同，该模型学习确定何时接收到足够的音频输入以生成翻译。这种方法在保持与非流式方法相当的翻译质量的同时，实现了显著降低的延迟，约为1-2秒。 AI

影响通过显著降低语音到文本翻译系统的延迟，实现了实时翻译应用。

排序理由该集群包含一篇详细介绍新模型架构及其性能的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Rogier C. van Dalen · 2026-05-14 12:32

使用 SpeechLLM 进行流式语音到文本翻译

Normally, a system that translates speech into text consists of separate modules for speech recognition and text-to-text translation. Combining those tasks into a SpeechLLM promises to exploit paralinguistic information in the speech and to reduce cascaded errors. But existing Sp…