PulseAugur
实时 07:34:50
English(EN) Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

NVIDIA 发布 Gated DeltaNet-2 以改进线性注意力

NVIDIA 推出了 Gated DeltaNet-2,这是一种新的线性注意力层,旨在改进循环神经网络中的内存编辑。该模型使用独立的通道门控机制,将擦除旧信息和写入新信息的过程分离开来,解决了先前 delta-rule 架构中的局限性。Gated DeltaNet-2 在 1000 亿 token 和 13 亿参数上进行了训练,在长上下文检索任务上表现优于 Mamba-2 和 KDA 等现有模型。 AI

影响 增强了循环模型中的长上下文处理能力,有望提高复杂语言任务的性能。

排序理由 该集群描述了一种新的模型架构及其在基准测试上的性能,该模型在 arXiv 论文中有详细介绍,并被科技新闻媒体报道。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

NVIDIA 发布 Gated DeltaNet-2 以改进线性注意力

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Jan Kautz ·

    Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

    Linear attention replaces the unbounded cache of softmax attention with a fixed-size recurrent state, reducing sequence mixing to linear time and decoding to constant memory. The hard part is not just what to forget, but how to edit this compressed memory without scrambling exist…

  2. MarkTechPost TIER_1 English(EN) · Asif Razzaq ·

    NVIDIA AI Releases Gated DeltaNet-2: A Linear Attention Layer That Decouples Erase and Write in the Delta Rule

    <p>Linear attention squeezes the unbounded KV cache into a fixed-size recurrent state, but editing that memory without scrambling existing associations is hard. Prior delta-rule models like Gated DeltaNet and KDA use one scalar gate to control both erasing old content and writing…

  3. Mastodon — fosstodon.org TIER_1 Polski(PL) · [email protected] ·

    NVIDIA presented Gated DeltaNet-2, a new linear attention architecture that drastically improves precision thanks to independent data write and delete gates

    NVIDIA zaprezentowała Gated DeltaNet-2, nową architekturę liniowej atencji, która dzięki niezależnym bramkom zapisu i usuwania danych drastycznie poprawia precyzję modeli AI w długich kontekstach. # si # ai # sztucznainteligencja # wiadomości # informacje # technologia https:// a…

  4. Mastodon — fosstodon.org TIER_1 English(EN) · [email protected] ·

    NVIDIA has released Gated DeltaNet-2, a linear attention layer that decouples erasing old content from writing new content via separate channel-wise gates. At 1

    NVIDIA has released Gated DeltaNet-2, a linear attention layer that decouples erasing old content from writing new content via separate channel-wise gates. At 1.3B parameters trained on 100B tokens, it outperforms Mamba-2, Gated DeltaNet and KDA on language modelling and long-con…