NVIDIA has introduced Gated DeltaNet-2, a new linear attention layer designed to improve memory editing in recurrent neural networks. This model separates the processes of erasing old information and writing new information using distinct channel-wise gates, addressing a limitation in previous delta-rule architectures. Trained on 100 billion tokens with 1.3 billion parameters, Gated DeltaNet-2 demonstrates superior performance over existing models like Mamba-2 and KDA, particularly in long-context retrieval tasks. AI
IMPACT Enhances long-context processing in recurrent models, potentially improving performance on complex language tasks.
RANK_REASON The cluster describes a new model architecture and its performance on benchmarks, detailed in an arXiv paper and covered by tech news outlets.
- Gated DeltaNet
- Gated DeltaNet-2
- Kimi Delta Attention
- Mamba-2
- Mamba-3
- Delta Rule
- Linear Attention
- NVIDIA
AI-generated summary · Google Gemini · from 4 sources. How we write summaries →