A Pocket Offline Model for Simultaneous Speech Translation as CUNI Submission to IWSLT 2026
Researchers are developing new methods for simultaneous speech translation, focusing on decoder-only large language models. One approach, AlignAtt4LLM, adapts attention mechanisms for these models to improve translation quality for languages like German and Italian, even in low-latency scenarios. Another method, DOA, uses self-attention within SpeechLLMs to derive alignment signals for long-form translation without requiring retraining. Additionally, a system called Canary, with 1 billion parameters, offers offline simultaneous translation capabilities for multiple languages. AI
IMPACT Advances in decoder-only LLM architectures and attention policies are improving the quality and efficiency of real-time speech translation.