PulseAugur
LIVE 10:07:39
tool · [1 source] ·
2
tool

OSDN improves linear attention with online preconditioning

Researchers have introduced OSDN, a novel method that enhances linear attention mechanisms by incorporating provable online preconditioning. This technique augments the Delta Rule with a diagonal preconditioner, which is updated online through hypergradient feedback. OSDN effectively scales the write-side key per feature, preserving the efficient parallel pipeline of DeltaNet without adding significant overhead. The method demonstrates improved performance in in-context recall tasks, showing substantial gains over existing methods at various parameter scales. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new technique to improve in-context recall in linear attention models, potentially enhancing their ability to handle long sequences.

RANK_REASON The cluster contains an academic paper detailing a new method for improving linear attention mechanisms. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 · Yinyu Ye ·

    OSDN: Improving Delta Rule with Provable Online Preconditioning in Linear Attention

    Linear attention and state-space models offer constant-memory alternatives to softmax attention, but often struggle with in-context associative recall. The Delta Rule mitigates this by writing each token via one step of online gradient descent. However, its step size relies on a …