Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

Blurry Window Attention

Researchers have introduced Blurry Window Attention (BLA), a novel method designed to improve the efficiency of Transformer language models in handling long contexts. BLA addresses the quadratic complexity and growing KV cache size limitations of standard Softmax Attention by reconstructing a blurry KV history from a frequency window using Dirichlet kernels. This approach offers state efficiency improvements over Sliding Window Attention and maintains competitive performance with other linear attention models on tasks requiring information retrieval. AI

IMPACT Introduces a more efficient attention mechanism for handling long sequences in language models.

Transformer
Softmax Attention
Linear Attention
Dirichlet kernels
State-Space Models
Sliding Window Attention
Blurry Window Attention
Attention with Bounded-memory Control
RegBench
Multi-Query Associate Recall