Alibaba's Qwen3.5-Omni adds audio and video to multimodal LLM capabilities

By PulseAugur Editorial · [1 sources] · 2026-03-29 20:00

Alibaba's Qwen team has released Qwen3.5-Omni, a new generation of omnimodal large language models capable of processing text, images, audio, and audio-visual content. This series features models named Plus, Flash, and Light, all supporting a 256k context window and capable of handling over 10 hours of audio. The architecture utilizes a Hybrid-Attention Mixture-of-Experts (MoE) approach for both its reasoning and generation components. AI

IMPACT Expands LLM capabilities into native audio and video processing, potentially enabling more sophisticated AI agents and applications.

RANK_REASON Frontier-lab model release with system card. [lever_c_demoted from frontier_release: ic=1 ai=1.0]

Read on Qwen tech blog →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Alibaba's Qwen3.5-Omni adds audio and video to multimodal LLM capabilities

COVERAGE [1]

Qwen tech blog TIER_1 English(EN) · QwenTeam · 2026-03-29 20:00

Qwen3.5-Omni: Scaling Up, Toward Native Omni-Modal AGI

Qwen3.5-Omni is Qwen’s latest generation of fully omnimodal LLM, supporting the understanding of text, images, audio, and audio-visual content. Both the Thinker and Talker in Qwen3.5-Omni adopt the Hybrid-Attention MoE. Qwen3.5-Omni series includes Instruct versions in three size…

COVERAGE [1]

Qwen3.5-Omni: Scaling Up, Toward Native Omni-Modal AGI

RELATED ENTITIES

RELATED TOPICS