PulseAugur
实时 12:03:17
English(EN) Training-free sparse attention based on cumulative energy filtering

新的LoLA方法提升Transformer的内存和召回能力

研究人员开发了LoLA,这是一种用于线性注意力机制的新型增强方法,可显著提高Transformer模型中的联想召回和记忆容量。LoLA将过去的键值对分布在三个内存系统中:一个本地滑动窗口,一个用于难以记忆的键值对的稀疏全局缓存,以及循环隐藏状态。这种方法在通过关键检索任务上的性能提高到97.4%的准确率,并且缓存比Llama 3.1 8B等现有模型小得多,在常识推理方面也优于其他亚二次模型。 AI

影响 LoLA的稀疏缓存和内存管理方法可以使Transformer处理更长的上下文,从而可能为终身学习和复杂推理开辟新的应用。

排序理由 该集群包含两篇arXiv论文,详细介绍了Transformer注意力机制的新研究。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.CL TIER_1 English(EN) · Luke McDermott, Robert W. Heath Jr., Rahul Parhi ·

    LoLA: Low-Rank Linear Attention With Sparse Caching

    arXiv:2505.23666v3 Announce Type: replace Abstract: The per-token cost of transformer inference scales with context length, preventing its application to lifelong in-context learning. Linear attention is an efficient alternative that maintains a constant memory footprint, even on…

  2. arXiv cs.CV TIER_1 English(EN) · Chunlu Li, Yixuan Pan, Bai Du, Zhenyuan Chen, Yanzhao Li, Hui Dong, Hui Wang, Zhiqiang Zou ·

    Training-free sparse attention based on cumulative energy filtering

    arXiv:2606.16317v1 Announce Type: new Abstract: Sparse attention accelerates Diffusion Transformers (DiTs) for video generation by computing only the important tokens while skipping the rest. The token selection strategy is key to balancing sparsity and accuracy. We formulate the…

  3. arXiv cs.CV TIER_1 English(EN) · Zhiqiang Zou ·

    Training-free sparse attention based on cumulative energy filtering

    Sparse attention accelerates Diffusion Transformers (DiTs) for video generation by computing only the important tokens while skipping the rest. The token selection strategy is key to balancing sparsity and accuracy. We formulate the token filtering process as a dual-goal optimiza…