English(EN) A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026

新方法利用仅解码器LLM增强同步语音翻译

作者 PulseAugur 编辑部 · [10 个来源] · 2026-05-29 15:27

研究人员正在开发新的同步语音翻译方法，重点关注仅解码器的大型语言模型。一种名为AlignAtt4LLM的方法，通过调整这些模型的注意力机制来提高德语和意大利语等语言的翻译质量，即使在低延迟场景下也是如此。另一种名为DOA的方法，在SpeechLLMs内部使用自注意力机制，在无需重新训练的情况下获得长文本翻译的对齐信号。此外，一个名为Canary的系统，拥有10亿参数，提供了多种语言的离线同步翻译能力。 AI

影响仅解码器LLM架构和注意力策略的进步正在提高实时语音翻译的质量和效率。

排序理由多篇研究论文详细介绍了提交给IWSLT 2026任务的同步语音翻译的新方法和模型。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 10 个来源。我们如何撰写摘要 →

报道来源 [10]

arXiv cs.CL TIER_1 English(EN) · Enes Yavuz Ugan, Maike Z\"ufle, Yuka Ko, Supriti Sinhamahapatra, Fabian Retkowski, Seymanur Akti, Jan Niehues, Alexander Waibel · 2026-06-04 04:00

多语言长篇语音指令遵循：KIT 提交 IWSLT 2026

arXiv:2606.04730v1 Announce Type: new Abstract: With the advent of Large Language Models, single-task and token-based multi-task models have evolved into instruction-based systems that infer task and target language implicitly from natural language prompts. This trend is reflecte…
arXiv cs.CL TIER_1 English(EN) · Alexander Waibel · 2026-06-03 11:13

多语言长篇语音指令遵循：KIT 提交 IWSLT 2026

With the advent of Large Language Models, single-task and token-based multi-task models have evolved into instruction-based systems that infer task and target language implicitly from natural language prompts. This trend is reflected in IWSLT's Instruction Following Track, which …
arXiv cs.AI TIER_1 English(EN) · Quentin Fuxa, Dominik Mach\'a\v{c}ek · 2026-06-03 04:00

AlignAtt4LLM：IWSLT 2026 同步语音翻译任务的快速 Decoder-Only LLM 对齐注意力机制

arXiv:2606.03967v1 Announce Type: cross Abstract: We describe AlignAtt4LLM, an IWSLT 2026 simultaneous speech translation system for English to German, Italian, and Chinese. The system is a synchronous cascade: Qwen3-ASR with forced alignment produces an incrementally updated sou…
arXiv cs.CL TIER_1 English(EN) · Aziz Sharipov Ortega, Dominik Mach\'a\v{c}ek · 2026-06-03 04:00

面向IWSLT 2026的CUNI提交：用于同步语音翻译的便携式离线模型

arXiv:2606.03948v1 Announce Type: new Abstract: We implement simultaneous translation capability with the offline direct speech-to-text translation model Canary, using the state-of-the-art policy AlignAtt, and submit it to IWSLT 2026 Simultaneous Speech Translation Shared task fo…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-03 00:00

SpeechEditBench：一个用于指令引导语音编辑的双语多属性基准

A bilingual multi-attribute benchmark for instruction-guided speech editing is introduced to systematically evaluate speech modification capabilities across atomic and compositional tasks.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-02 17:52

AlignAtt4LLM：IWSLT 2026 同步语音翻译任务的快速 Decoder-Only LLM 对齐注意力机制

We describe AlignAtt4LLM, an IWSLT 2026 simultaneous speech translation system for English to German, Italian, and Chinese. The system is a synchronous cascade: Qwen3-ASR with forced alignment produces an incrementally updated source transcript, and Gemma-4 E4B-it translates that…
arXiv cs.AI TIER_1 English(EN) · Dominik Macháček · 2026-06-02 17:52

AlignAtt4LLM：IWSLT 2026 同步语音翻译任务的快速 Decoder-Only LLM 对齐注意力机制

We describe AlignAtt4LLM, an IWSLT 2026 simultaneous speech translation system for English to German, Italian, and Chinese. The system is a synchronous cascade: Qwen3-ASR with forced alignment produces an incrementally updated source transcript, and Gemma-4 E4B-it translates that…
arXiv cs.CL TIER_1 English(EN) · Dominik Macháček · 2026-06-02 17:37

面向IWSLT 2026的CUNI提交：用于同步语音翻译的便携式离线模型

We implement simultaneous translation capability with the offline direct speech-to-text translation model Canary, using the state-of-the-art policy AlignAtt, and submit it to IWSLT 2026 Simultaneous Speech Translation Shared task for Czech to English and English to German and Ita…
arXiv cs.AI TIER_1 English(EN) · Sara Papi, Luisa Bentivogli · 2026-06-01 04:00

DOA：面向SpeechLLMs长篇幅同步翻译的无训练解码器端注意力策略

arXiv:2605.31432v1 Announce Type: cross Abstract: Simultaneous speech-to-text translation (SimulST) generates translations while speech is still unfolding, requiring a streaming policy that decides when to read and when to write. State-of-the-art approaches rely on attention-base…
arXiv cs.AI TIER_1 English(EN) · Luisa Bentivogli · 2026-05-29 15:27

DOA：面向SpeechLLMs长篇幅同步翻译的无训练解码器端注意力策略

Simultaneous speech-to-text translation (SimulST) generates translations while speech is still unfolding, requiring a streaming policy that decides when to read and when to write. State-of-the-art approaches rely on attention-based encoder-decoder models where cross-attention pro…

报道来源 [10]

相关实体

相关话题