New methods enhance simultaneous speech translation with decoder-only LLMs

By PulseAugur Editorial · [10 sources] · 2026-05-29 15:27

Researchers are developing new methods for simultaneous speech translation, focusing on decoder-only large language models. One approach, AlignAtt4LLM, adapts attention mechanisms for these models to improve translation quality for languages like German and Italian, even in low-latency scenarios. Another method, DOA, uses self-attention within SpeechLLMs to derive alignment signals for long-form translation without requiring retraining. Additionally, a system called Canary, with 1 billion parameters, offers offline simultaneous translation capabilities for multiple languages. AI

IMPACT Advances in decoder-only LLM architectures and attention policies are improving the quality and efficiency of real-time speech translation.

RANK_REASON Multiple research papers detailing new methods and models for simultaneous speech translation submitted to the IWSLT 2026 task.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 10 sources. How we write summaries →

COVERAGE [10]

arXiv cs.CL TIER_1 English(EN) · Enes Yavuz Ugan, Maike Z\"ufle, Yuka Ko, Supriti Sinhamahapatra, Fabian Retkowski, Seymanur Akti, Jan Niehues, Alexander Waibel · 2026-06-04 04:00

Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

arXiv:2606.04730v1 Announce Type: new Abstract: With the advent of Large Language Models, single-task and token-based multi-task models have evolved into instruction-based systems that infer task and target language implicitly from natural language prompts. This trend is reflecte…
arXiv cs.CL TIER_1 English(EN) · Alexander Waibel · 2026-06-03 11:13

Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

With the advent of Large Language Models, single-task and token-based multi-task models have evolved into instruction-based systems that infer task and target language implicitly from natural language prompts. This trend is reflected in IWSLT's Instruction Following Track, which …
arXiv cs.AI TIER_1 English(EN) · Quentin Fuxa, Dominik Mach\'a\v{c}ek · 2026-06-03 04:00

AlignAtt4LLM: Fast AlignAtt for Decoder-Only LLMs at IWSLT 2026 Simultaneous Speech Translation Task

arXiv:2606.03967v1 Announce Type: cross Abstract: We describe AlignAtt4LLM, an IWSLT 2026 simultaneous speech translation system for English to German, Italian, and Chinese. The system is a synchronous cascade: Qwen3-ASR with forced alignment produces an incrementally updated sou…
arXiv cs.CL TIER_1 English(EN) · Aziz Sharipov Ortega, Dominik Mach\'a\v{c}ek · 2026-06-03 04:00

A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026

arXiv:2606.03948v1 Announce Type: new Abstract: We implement simultaneous translation capability with the offline direct speech-to-text translation model Canary, using the state-of-the-art policy AlignAtt, and submit it to IWSLT 2026 Simultaneous Speech Translation Shared task fo…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-03 00:00

SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

A bilingual multi-attribute benchmark for instruction-guided speech editing is introduced to systematically evaluate speech modification capabilities across atomic and compositional tasks.
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-02 17:52

AlignAtt4LLM: Fast AlignAtt for Decoder-Only LLMs at IWSLT 2026 Simultaneous Speech Translation Task

We describe AlignAtt4LLM, an IWSLT 2026 simultaneous speech translation system for English to German, Italian, and Chinese. The system is a synchronous cascade: Qwen3-ASR with forced alignment produces an incrementally updated source transcript, and Gemma-4 E4B-it translates that…
arXiv cs.AI TIER_1 English(EN) · Dominik Macháček · 2026-06-02 17:52

AlignAtt4LLM: Fast AlignAtt for Decoder-Only LLMs at IWSLT 2026 Simultaneous Speech Translation Task

We describe AlignAtt4LLM, an IWSLT 2026 simultaneous speech translation system for English to German, Italian, and Chinese. The system is a synchronous cascade: Qwen3-ASR with forced alignment produces an incrementally updated source transcript, and Gemma-4 E4B-it translates that…
arXiv cs.CL TIER_1 English(EN) · Dominik Macháček · 2026-06-02 17:37

A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026

We implement simultaneous translation capability with the offline direct speech-to-text translation model Canary, using the state-of-the-art policy AlignAtt, and submit it to IWSLT 2026 Simultaneous Speech Translation Shared task for Czech to English and English to German and Ita…
arXiv cs.AI TIER_1 English(EN) · Sara Papi, Luisa Bentivogli · 2026-06-01 04:00

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

arXiv:2605.31432v1 Announce Type: cross Abstract: Simultaneous speech-to-text translation (SimulST) generates translations while speech is still unfolding, requiring a streaming policy that decides when to read and when to write. State-of-the-art approaches rely on attention-base…
arXiv cs.AI TIER_1 English(EN) · Luisa Bentivogli · 2026-05-29 15:27

DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

Simultaneous speech-to-text translation (SimulST) generates translations while speech is still unfolding, requiring a streaming policy that decides when to read and when to write. State-of-the-art approaches rely on attention-based encoder-decoder models where cross-attention pro…

COVERAGE [10]

RELATED ENTITIES

RELATED TOPICS