Alibaba Qwen3.5 model offers real-time translation with voice cloning

By PulseAugur Editorial · [3 sources] · 2026-05-20 08:09

Alibaba's Qwen team has released Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model that significantly reduces latency to 2.8 seconds. This new model expands language support to 60 input languages and 29 output languages, while also incorporating visual cues like lip movements to improve accuracy in noisy environments. A standout feature is its ability to clone the original speaker's voice in real-time for translated output, creating a more natural listening experience. AI

IMPACT Enhances real-time multilingual communication by reducing latency and improving accuracy through multimodal input and voice cloning.

RANK_REASON Model release from a major AI lab (Alibaba) with significant performance improvements and new capabilities. [lever_c_demoted from frontier_release: ic=2 ai=1.0]

Read on MarkTechPost →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

Alibaba Qwen3.5 model offers real-time translation with voice cloning

COVERAGE [3]

MarkTechPost TIER_1 English(EN) · Asif Razzaq · 2026-05-20 08:09

Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency

<p>Alibaba's Qwen team has released Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model that processes audio and video simultaneously. The model covers 60 input languages and produces speech output in 29 languages at 2.8 seconds of latency. Key additions over th…
Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] · 2026-05-20 09:51

Alibaba's Qwen team has unveiled Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model that processes audio and video simultaneously. The model

Alibaba's Qwen team has unveiled Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model that processes audio and video simultaneously. The model covers 60 input languages and produces speech output in 29 languages at just 2.8 seconds latency. Key features include r…

LINKS marktechpost.com/…/alibaba-qwen-team-intr…
Mastodon — mastodon.social TIER_1 English(EN) · [email protected] · 2026-05-20 08:51

Alibaba's Qwen team has unveiled Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model processing audio and video simultaneously. The model cove

Alibaba's Qwen team has unveiled Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model processing audio and video simultaneously. The model covers 60 input languages and produces speech output in 29 languages at just 2.8 seconds latency. Key features include real-…

LINKS marktechpost.com/…/alibaba-qwen-team-intr…

COVERAGE [3]

Alibaba Qwen Team Introduces Qwen3.5-LiveTranslate-Flash: Real-Time Multimodal Interpretation Across 60 Languages at 2.8-Second Latency

Alibaba's Qwen team has unveiled Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model that processes audio and video simultaneously. The model

Alibaba's Qwen team has unveiled Qwen3.5-LiveTranslate-Flash, a real-time multimodal translation model processing audio and video simultaneously. The model cove

RELATED ENTITIES

RELATED TOPICS