PulseAugur
实时 14:31:47

Key-Value Means attention offers O(N) transformer performance

Researchers have introduced Key-Value Means (KVM), a new attention mechanism for transformers that can handle both fixed-size and growing states. When implemented with a fixed-size cache, KVM functions as an O(N) chunked RNN with minimal parameter additions. A growable KVM cache version demonstrates competitive performance on long-context tasks, offering subquadratic prefill time and sublinear state growth. This approach is compatible with standard operations, supports chunk-wise parallelizable training, and provides a flexible trade-off between prefill time complexity and memory usage. AI

影响 Introduces a novel attention mechanism that improves transformer efficiency for long-context tasks.

排序理由 Publication of an academic paper detailing a new model architecture. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Key-Value Means attention offers O(N) transformer performance

报道来源 [1]

  1. arXiv cs.CL TIER_1 Suomi(FI) · Eugene Cheah ·

    Key-Value Means

    We present Key-Value Means ("KVM"), a novel block-recurrence for attention that can accommodate either fixed-size or growing state. Equipping a strong transformer baseline with fixed-size KVM attention layers yields a strong $O(N)$ chunked RNN, while adding only an insignificant …