Blurry Window Attention
Researchers have introduced Blurry Window Attention (BLA), a novel method designed to improve the efficiency of Transformer language models in handling long contexts. BLA addresses the quadratic complexity and growing KV cache size limitations of standard Softmax Attention by reconstructing a blurry KV history from a frequency window using Dirichlet kernels. This approach offers state efficiency improvements over Sliding Window Attention and maintains competitive performance with other linear attention models on tasks requiring information retrieval. AI
IMPACT Introduces a more efficient attention mechanism for handling long sequences in language models.