PulseAugur
EN
LIVE 23:22:42

NVIDIA unveils Gated DeltaNet-2 for improved linear attention

NVIDIA has introduced Gated DeltaNet-2, a new linear attention layer designed to improve memory editing in recurrent neural networks. This model separates the processes of erasing old information and writing new information using distinct channel-wise gates, addressing a limitation in previous delta-rule architectures. Trained on 100 billion tokens with 1.3 billion parameters, Gated DeltaNet-2 demonstrates superior performance over existing models like Mamba-2 and KDA, particularly in long-context retrieval tasks. AI

IMPACT Enhances long-context processing in recurrent models, potentially improving performance on complex language tasks.

RANK_REASON The cluster describes a new model architecture and its performance on benchmarks, detailed in an arXiv paper and covered by tech news outlets.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

NVIDIA unveils Gated DeltaNet-2 for improved linear attention

COVERAGE [4]

  1. arXiv cs.AI TIER_1 English(EN) · Jan Kautz ·

    Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

    Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling exist…

  2. MarkTechPost TIER_1 English(EN) · Asif Razzaq ·

    NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule

    <p>Linear attention squeezes the unbounded KV cache into a fixed-size recurrent state, but editing that memory without scrambling existing associations is hard. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to control both erasing old content and writing…

  3. Mastodon — fosstodon.org TIER_1 Polski(PL) · [email protected] ·

    NVIDIA presented Gated DeltaNet-2, a new linear attention architecture that drastically improves precision thanks to independent data write and delete gates

    NVIDIA zaprezentowała Gated DeltaNet-2, nową architekturę liniowej atencji, która dzięki niezależnym bramkom zapisu i usuwania danych drastycznie poprawia precyzję modeli AI w długich kontekstach. # si # ai # sztucznainteligencja # wiadomości # informacje # technologia https:// a…

  4. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    NVIDIA has released Gated DeltaNet-2, a linear attention layer that decouples erasing old content from writing new content via separate channel-wise gates. At 1

    NVIDIA has released Gated DeltaNet-2, a linear attention layer that decouples erasing old content from writing new content via separate channel-wise gates. At 1.3B parameters trained on 100B tokens, it outperforms Mamba-2, Gated DeltaNet and KDA on language modelling and long-con…