Brief

last 24h

[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 4d

WriteSAE: Sparse Autoencoders for Recurrent State

Researchers have developed WriteSAE, a novel sparse autoencoder designed to manipulate the matrix updates within recurrent language model states. This method learns rank-1 matrix atoms that directly replace the model's own matrix updates, showing a significant improvement in final token distribution accuracy. The technique has been successfully applied to models like Gated DeltaNet and Mamba-2, demonstrating its potential for steering model generation and understanding internal state dynamics. AI

IMPACT Enables direct intervention and steering of recurrent language model states, potentially leading to more controllable and understandable AI generation.
- Gated DeltaNet
- arXiv
- Mamba-2
- RWKV-7
- WriteSAE
TOOL · r/MachineLearning English(EN) · 1d

PapersWithCode new features - week 1 [P]

Hugging Face has launched new features for PapersWithCode, a platform tracking AI state-of-the-art. The updates include support for multiple metrics on leaderboards, such as for Automatic Speech Recognition and Object Detection. The platform now also accommodates external papers beyond arXiv, automatically enriching them with relevant tags and data, and displays paper lineage to show follow-ups or predecessors. AI

IMPACT Enhances AI research tracking and sharing capabilities for the community.
RESEARCH · arXiv cs.AI English(EN) · 4d · [4 sources]

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

NVIDIA has introduced Gated DeltaNet-2, a new linear attention layer designed to improve memory editing in recurrent neural networks. This model separates the processes of erasing old information and writing new information using distinct channel-wise gates, addressing a limitation in previous delta-rule architectures. Trained on 100 billion tokens with 1.3 billion parameters, Gated DeltaNet-2 demonstrates superior performance over existing models like Mamba-2 and KDA, particularly in long-context retrieval tasks. AI

IMPACT Enhances long-context processing in recurrent models, potentially improving performance on complex language tasks.

Brief

WriteSAE: Sparse Autoencoders for Recurrent State

PapersWithCode new features - week 1 [P]

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention