PulseAugur
实时 14:06:24

新的Erase-then-Delta Attention增强了循环记忆模型

研究人员推出了一种新颖的记忆更新规则Erase-then-Delta Attention (EDA),旨在增强循环记忆模型。与先前将更正锚定到写入地址的方法不同,EDA将擦除和写入操作解耦,允许在写入新内容之前在单独的地址主动抑制过时信息。这种双重能力扩展了记忆管理容量,在密集和混合专家(MoE)架构的语言模型预训练实验中被证明是有效的。EDA在长上下文评估中也表现出卓越的性能,即使在广泛的中期训练后也能保持其优势。 AI

影响 这种新的注意力机制可以提高未来语言模型的效率和长上下文能力。

排序理由 该集群包含一篇详细介绍语言模型注意力机制新方法的论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新的Erase-then-Delta Attention增强了循环记忆模型

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Xiao Li, Chengruidong Zhang, Hao Luo, Xi Lin, Zekun Wang, Zihan Qiu, Yunfei Mao, Langshi Chen, Man Yuan, Minmin Sun, Huiqiang Jiang, Siqi Zhang, Rui Men, Wei Hu, Gong Cheng, Bo Zheng, Dayiheng Liu, Jingren Zhou ·

    Erase-then-Delta Attention: Decoupling Erase and Write Addresses in Delta-Rule Linear Attention

    arXiv:2606.26560v1 Announce Type: new Abstract: Delta-rule linear attention improves recurrent memory updates by correcting what is already stored at the current write address before writing new content. However, the active correction is still anchored to that same write address.…

  2. arXiv cs.CL TIER_1 English(EN) · Jingren Zhou ·

    Erase-then-Delta Attention: Decoupling Erase and Write Addresses in Delta-Rule Linear Attention

    Delta-rule linear attention improves recurrent memory updates by correcting what is already stored at the current write address before writing new content. However, the active correction is still anchored to that same write address. As a result, stale information stored at a diff…