PulseAugur
EN
LIVE 11:51:23

New methods enhance simultaneous speech translation with decoder-only LLMs

Researchers are developing new methods for simultaneous speech translation, focusing on decoder-only large language models. One approach, AlignAtt4LLM, adapts attention mechanisms for these models to improve translation quality for languages like German and Italian, even in low-latency scenarios. Another method, DOA, uses self-attention within SpeechLLMs to derive alignment signals for long-form translation without requiring retraining. Additionally, a system called Canary, with 1 billion parameters, offers offline simultaneous translation capabilities for multiple languages. AI

IMPACT Advances in decoder-only LLM architectures and attention policies are improving the quality and efficiency of real-time speech translation.

RANK_REASON Multiple research papers detailing new methods and models for simultaneous speech translation submitted to the IWSLT 2026 task.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 10 sources. How we write summaries →

COVERAGE [10]

  1. arXiv cs.CL TIER_1 English(EN) · Enes Yavuz Ugan, Maike Z\"ufle, Yuka Ko, Supriti Sinhamahapatra, Fabian Retkowski, Seymanur Akti, Jan Niehues, Alexander Waibel ·

    Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

    arXiv:2606.04730v1 Announce Type: new Abstract: With the advent of Large Language Models, single-task and token-based multi-task models have evolved into instruction-based systems that infer task and target language implicitly from natural language prompts. This trend is reflecte…

  2. arXiv cs.CL TIER_1 English(EN) · Alexander Waibel ·

    Multilingual Long-Form Speech Instruction Following: KIT's Submission to IWSLT 2026

    With the advent of Large Language Models, single-task and token-based multi-task models have evolved into instruction-based systems that infer task and target language implicitly from natural language prompts. This trend is reflected in IWSLT's Instruction Following Track, which …

  3. arXiv cs.AI TIER_1 English(EN) · Quentin Fuxa, Dominik Mach\'a\v{c}ek ·

    AlignAtt4LLM: Fast AlignAtt for Decoder-Only LLMs at IWSLT 2026 Simultaneous Speech Translation Task

    arXiv:2606.03967v1 Announce Type: cross Abstract: We describe AlignAtt4LLM, an IWSLT 2026 simultaneous speech translation system for English to German, Italian, and Chinese. The system is a synchronous cascade: Qwen3-ASR with forced alignment produces an incrementally updated sou…

  4. arXiv cs.CL TIER_1 English(EN) · Aziz Sharipov Ortega, Dominik Mach\'a\v{c}ek ·

    A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026

    arXiv:2606.03948v1 Announce Type: new Abstract: We implement simultaneous translation capability with the offline direct speech-to-text translation model Canary, using the state-of-the-art policy AlignAtt, and submit it to IWSLT 2026 Simultaneous Speech Translation Shared task fo…

  5. Hugging Face Daily Papers TIER_1 English(EN) ·

    SpeechEditBench: A Bilingual Multi-Attribute Benchmark for Instruction-Guided Speech Editing

    A bilingual multi-attribute benchmark for instruction-guided speech editing is introduced to systematically evaluate speech modification capabilities across atomic and compositional tasks.

  6. Hugging Face Daily Papers TIER_1 English(EN) ·

    AlignAtt4LLM: Fast AlignAtt for Decoder-Only LLMs at IWSLT 2026 Simultaneous Speech Translation Task

    We describe AlignAtt4LLM, an IWSLT 2026 simultaneous speech translation system for English to German, Italian, and Chinese. The system is a synchronous cascade: Qwen3-ASR with forced alignment produces an incrementally updated source transcript, and Gemma-4 E4B-it translates that…

  7. arXiv cs.AI TIER_1 English(EN) · Dominik Macháček ·

    AlignAtt4LLM: Fast AlignAtt for Decoder-Only LLMs at IWSLT 2026 Simultaneous Speech Translation Task

    We describe AlignAtt4LLM, an IWSLT 2026 simultaneous speech translation system for English to German, Italian, and Chinese. The system is a synchronous cascade: Qwen3-ASR with forced alignment produces an incrementally updated source transcript, and Gemma-4 E4B-it translates that…

  8. arXiv cs.CL TIER_1 English(EN) · Dominik Macháček ·

    A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026

    We implement simultaneous translation capability with the offline direct speech-to-text translation model Canary, using the state-of-the-art policy AlignAtt, and submit it to IWSLT 2026 Simultaneous Speech Translation Shared task for Czech to English and English to German and Ita…

  9. arXiv cs.AI TIER_1 English(EN) · Sara Papi, Luisa Bentivogli ·

    DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

    arXiv:2605.31432v1 Announce Type: cross Abstract: Simultaneous speech-to-text translation (SimulST) generates translations while speech is still unfolding, requiring a streaming policy that decides when to read and when to write. State-of-the-art approaches rely on attention-base…

  10. arXiv cs.AI TIER_1 English(EN) · Luisa Bentivogli ·

    DOA: Training-Free Decoder-Only Attention Policy for Long-Form Simultaneous Translation with SpeechLLMs

    Simultaneous speech-to-text translation (SimulST) generates translations while speech is still unfolding, requiring a streaming policy that decides when to read and when to write. State-of-the-art approaches rely on attention-based encoder-decoder models where cross-attention pro…