PulseAugur
实时 13:47:27

新的注意力方法应对大语言模型长上下文挑战

研究人员正在开发新的注意力机制来处理大型语言模型中日益增长的长上下文。一种方法,Runtime-Certified Bounded-Error Quantized Attention,使用分层 KV 缓存来压缩内存,同时保证回退到精确注意力,确保语言建模和检索等任务的质量。另一种方法,DashAttention,采用可微分稀疏分层注意力来适应性地选择相关 token,以与全注意力相当的准确性实现高稀疏度,并提供优于现有分层方法的性能。Variational Linear Attention (VLA) 将线性注意力重构为正则化最小二乘问题,限制状态范数增长并提高联想回忆准确性,同时还实现了显著的加速。 AI

影响 注意力机制的这些进步有望显著提高大语言模型在处理和理解长上下文方面的效率和能力。

排序理由 该集群包含多篇详细介绍大型语言模型新颖注意力机制的研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

新的注意力方法应对大语言模型长上下文挑战

报道来源 [5]

  1. arXiv cs.AI TIER_1 English(EN) · Dean Calver ·

    Runtime-Certified Bounded-Error Quantized Attention

    arXiv:2605.20868v1 Announce Type: cross Abstract: KV cache quantization reduces the memory cost of long-context LLM inference, but introduces approximation error that is typically validated only empirically. Existing systems rely on average-case robustness, with no mechanism to d…

  2. arXiv cs.AI TIER_1 English(EN) · Dean Calver ·

    Runtime-Certified Bounded-Error Quantized Attention

    KV cache quantization reduces the memory cost of long-context LLM inference, but introduces approximation error that is typically validated only empirically. Existing systems rely on average-case robustness, with no mechanism to detect or recover from failures at runtime. We pres…

  3. arXiv cs.AI TIER_1 English(EN) · Marcos V. Treviso ·

    DashAttention: Differentiable and Adaptive Sparse Hierarchical Attention

    Current hierarchical attention methods, such as NSA and InfLLMv2, select the top-k relevant key-value (KV) blocks based on coarse attention scores and subsequently apply fine-grained softmax attention on the selected tokens. However, the top-k operation assumes the number of rele…

  4. Hugging Face Daily Papers TIER_1 English(EN) ·

    Variational Linear Attention: Stable Associative Memory for Long-Context Transformers

    Linear attention reduces the quadratic cost of softmax attention to $\mathcal{O}(T)$, but its memory state grows as $\mathcal{O}(T)$ in Frobenius norm, causing progressive interference between stored associations. We introduce \textbf{Variational Linear Attention} (VLA), which re…

  5. dev.to — LLM tag TIER_1 English(EN) · Jayavelu Balaji ·

    Sub-Quadratic Sparse Attention: How SSA Solves the Long-Context Problem

    <p><a class="article-body-image-wrapper" href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrdn35fr8ejjxnsxz6cq.png"><img alt=" " src="https://media2.dev…