PulseAugur
EN
LIVE 12:30:07

New Erase-then-Delta Attention enhances recurrent memory models

Researchers have introduced Erase-then-Delta Attention (EDA), a novel memory update rule designed to enhance recurrent memory models. Unlike previous methods that anchor corrections to the write address, EDA decouples the erase and write operations, allowing for the active suppression of outdated information at a separate address before new content is written. This dual capability expands memory management capacity, proving effective in language model pretraining experiments with both dense and Mixture-of-Experts (MoE) architectures. EDA also demonstrates superior performance in long-context evaluations, maintaining its advantage even after extensive midtraining. AI

IMPACT This new attention mechanism could improve the efficiency and long-context capabilities of future language models.

RANK_REASON The cluster contains a research paper detailing a new method for attention mechanisms in language models.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New Erase-then-Delta Attention enhances recurrent memory models

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Xiao Li, Chengruidong Zhang, Hao Luo, Xi Lin, Zekun Wang, Zihan Qiu, Yunfei Mao, Langshi Chen, Man Yuan, Minmin Sun, Huiqiang Jiang, Siqi Zhang, Rui Men, Wei Hu, Gong Cheng, Bo Zheng, Dayiheng Liu, Jingren Zhou ·

    Erase-then-Delta Attention: Decoupling Erase and Write Addresses in Delta-Rule Linear Attention

    arXiv:2606.26560v1 Announce Type: new Abstract: Delta-rule linear attention improves recurrent memory updates by correcting what is already stored at the current write address before writing new content. However, the active correction is still anchored to that same write address.…

  2. arXiv cs.CL TIER_1 English(EN) · Jingren Zhou ·

    Erase-then-Delta Attention: Decoupling Erase and Write Addresses in Delta-Rule Linear Attention

    Delta-rule linear attention improves recurrent memory updates by correcting what is already stored at the current write address before writing new content. However, the active correction is still anchored to that same write address. As a result, stale information stored at a diff…