Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 4d · [4 sources]

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

NVIDIA has introduced Gated DeltaNet-2, a new linear attention layer designed to improve memory editing in recurrent neural networks. This model separates the processes of erasing old information and writing new information using distinct channel-wise gates, addressing a limitation in previous delta-rule architectures. Trained on 100 billion tokens with 1.3 billion parameters, Gated DeltaNet-2 demonstrates superior performance over existing models like Mamba-2 and KDA, particularly in long-context retrieval tasks. AI

IMPACT Enhances long-context processing in recurrent models, potentially improving performance on complex language tasks.

Mamba-3
Gated DeltaNet-2
Gated DeltaNet
Kimi Delta Attention
Mamba-2
Delta Rule
Linear Attention
NVIDIA