Brief · PulseAugur

RESEARCH · arXiv cs.CL English(EN) · 1w · [10 sources]

A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026

Researchers are developing new methods for simultaneous speech translation, focusing on decoder-only large language models. One approach, AlignAtt4LLM, adapts attention mechanisms for these models to improve translation quality for languages like German and Italian, even in low-latency scenarios. Another method, DOA, uses self-attention within SpeechLLMs to derive alignment signals for long-form translation without requiring retraining. Additionally, a system called Canary, with 1 billion parameters, offers offline simultaneous translation capabilities for multiple languages. AI

IMPACT Advances in decoder-only LLM architectures and attention policies are improving the quality and efficiency of real-time speech translation.

Decoder-Only Attention (DOA)
Phi4-Multimodal
Speech Large Language Models (SpeechLLMs)
Qwen3-Omni
IWSLT 2026
DOA
Gemma-4 E4B-it
AlignAtt4LLM
Canary
Qwen3-ASR
SpeechLLMs
KIT