Qwen develops FlashQLA for efficient Gated Delta Network attention

By PulseAugur Editorial · [1 sources] · 2026-04-28 02:00

Qwen has developed FlashQLA, a new set of fused linear attention kernels designed to be compatible with both forward and backward passes in deep learning. These kernels are optimized for Gated Delta Networks (GDN), which are now a core component in Qwen's model family, including Qwen3-Next and its subsequent iterations like Qwen3.5 and Qwen3.6. The development aims to improve efficiency and scalability for large models with extended context windows. AI

IMPACT Optimizes attention mechanisms for large language models, potentially improving training and inference efficiency for Qwen's model family.

RANK_REASON The cluster describes a new set of technical kernels for attention mechanisms in deep learning models, presented in a research blog post. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Qwen tech blog →

paper
infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Qwen develops FlashQLA for efficient Gated Delta Network attention

COVERAGE [1]

Qwen tech blog TIER_1 English(EN) · QwenTeam · 2026-04-28 02:00

FlashQLA: CP-/Bwd-Friendly Fused Linear Attention Kernels for GDN

.katex-display > .katex { font-size: 1.1em; } .katex { font-size: 1.1em; } table .katex { font-size: 1.1em; } Following the release of Qwen3-Next, Gated Delta Network (GDN) has become the workhorse attention layer across the Qwen family — from Qwen3-Next-80B-A3B all the way to th…

COVERAGE [1]

FlashQLA: CP-/Bwd-Friendly Fused Linear Attention Kernels for GDN

RELATED ENTITIES

RELATED TOPICS