Key-Value Means attention offers O(N) transformer performance

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have introduced Key-Value Means (KVM), a new attention mechanism for transformers that can handle both fixed-size and growing states. When implemented with a fixed-size cache, KVM functions as an O(N) chunked RNN with minimal parameter additions. A growable KVM cache version demonstrates competitive performance on long-context tasks, offering subquadratic prefill time and sublinear state growth. This approach is compatible with standard operations, supports chunk-wise parallelizable training, and provides a flexible trade-off between prefill time complexity and memory usage. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a novel attention mechanism that improves transformer efficiency for long-context tasks.

RANK_REASON Publication of an academic paper detailing a new model architecture. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

COVERAGE [1]

arXiv cs.CL TIER_1 Suomi(FI) · Eugene Cheah · 2026-05-11 02:04

Key-Value Means

We present Key-Value Means ("KVM"), a novel block-recurrence for attention that can accommodate either fixed-size or growing state. Equipping a strong transformer baseline with fixed-size KVM attention layers yields a strong $O(N)$ chunked RNN, while adding only an insignificant …

COVERAGE [1]

Key-Value Means

RELATED ENTITIES

RELATED TOPICS