新技术将大型语言模型注意力机制的 I/O 成本大幅降低

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-22 15:23

研究人员开发了一种新技术，可以显著降低大型语言模型中注意力机制的 I/O 复杂性。该方法旨在最大限度地减少快速内存和慢速内存之间的数据传输，这是这些模型效率的关键因素。新方法实现了相对于输入规模的近线性 I/O 成本，相比现有的二次方成本有了实质性改进，并且受到了近期近似注意力框架的启发。 AI

影响降低了注意力机制的计算开销，可能支持更大规模的模型或更快的推理速度。

排序理由该集群包含一篇学术论文，详细介绍了提高大型语言模型效率的新技术方法。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · P\'al Andr\'as Papp, Aleksandros Sobczyk, Anastasios Zouzias · 2026-05-25 04:00

Approaching I/O-optimality for Approximate Attention

arXiv:2605.23751v1 Announce Type: new Abstract: We revisit the I/O complexity of attention in large language models. Given query-key-value matrices $Q,K,V\in\mathbb{R}^{n\times d}$, and a machine with fast memory size $M$, the goal is to compute the "attention matrix" $A=\text{so…
arXiv cs.LG TIER_1 English(EN) · Anastasios Zouzias · 2026-05-22 15:23

Approaching I/O-optimality for Approximate Attention

We revisit the I/O complexity of attention in large language models. Given query-key-value matrices $Q,K,V\in\mathbb{R}^{n\times d}$, and a machine with fast memory size $M$, the goal is to compute the "attention matrix" $A=\text{softmax}(Q K ^{\top}/\sqrt{d}) V$ with the minimal…