Brief

last 24h

[5/5] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 4d

WriteSAE: Sparse Autoencoders for Recurrent State

Researchers have developed WriteSAE, a novel sparse autoencoder designed to manipulate the matrix updates within recurrent language model states. This method learns rank-1 matrix atoms that directly replace the model's own matrix updates, showing a significant improvement in final token distribution accuracy. The technique has been successfully applied to models like Gated DeltaNet and Mamba-2, demonstrating its potential for steering model generation and understanding internal state dynamics. AI

IMPACT Enables direct intervention and steering of recurrent language model states, potentially leading to more controllable and understandable AI generation.
- RWKV-7
- arXiv
- Mamba-2
- Gated DeltaNet
- WriteSAE
RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data

Researchers have developed MambaGaze, a new framework designed to accurately assess cognitive load using eye-gaze tracking data. This system utilizes bidirectional Mamba-2 to efficiently model long-range temporal dependencies and an XMD encoding method to explicitly handle missing data, such as that caused by blinks. MambaGaze demonstrated superior performance over existing models on benchmark datasets and is feasible for real-time deployment on edge devices like NVIDIA Jetson platforms. AI

IMPACT Introduces a novel approach for real-time cognitive load assessment, potentially enabling more responsive human-AI interaction in safety-critical systems.
- CL-Drive
- Transformer
- CNN
- Mamba-2
- NVIDIA Jetson
- Amir Mousavi Seyed
- MambaGaze
- CLARE
- ResNet
TOOL · r/MachineLearning English(EN) · 2d

PapersWithCode new features - week 1 [P]

Hugging Face has launched new features for PapersWithCode, a platform tracking AI state-of-the-art. The updates include support for multiple metrics on leaderboards, such as for Automatic Speech Recognition and Object Detection. The platform now also accommodates external papers beyond arXiv, automatically enriching them with relevant tags and data, and displays paper lineage to show follow-ups or predecessors. AI

IMPACT Enhances AI research tracking and sharing capabilities for the community.
RESEARCH · arXiv cs.AI English(EN) · 5d · [4 sources]

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

NVIDIA has introduced Gated DeltaNet-2, a new linear attention layer designed to improve memory editing in recurrent neural networks. This model separates the processes of erasing old information and writing new information using distinct channel-wise gates, addressing a limitation in previous delta-rule architectures. Trained on 100 billion tokens with 1.3 billion parameters, Gated DeltaNet-2 demonstrates superior performance over existing models like Mamba-2 and KDA, particularly in long-context retrieval tasks. AI

IMPACT Enhances long-context processing in recurrent models, potentially improving performance on complex language tasks.
SIGNIFICANT · Together AI blog (SW) · 2mo

Mamba-3

Together AI has released Mamba-3, a new state space model (SSM) prioritizing inference efficiency over training speed. This model features a more expressive recurrence formula, complex-valued state tracking, and a multi-input, multi-output (MIMO) variant that enhances accuracy without sacrificing decoding speed. Mamba-3 SISO has demonstrated superior performance in prefill and decode latency compared to previous Mamba versions and even the Llama-3.2-1B Transformer model at the 1.5B parameter scale. The team has also open-sourced the model's kernels, developed collaboratively with researchers from Carnegie Mellon University, Princeton University, and Cartesia AI. AI

IMPACT Sets a new benchmark for inference efficiency in state space models, potentially influencing future LLM architectures and deployment strategies.

Brief

WriteSAE: Sparse Autoencoders for Recurrent State

MambaGaze: Bidirectional Mamba with Explicit Missing Data Modeling for Cognitive Load Assessment from Eye-Gaze Tracking Data

PapersWithCode new features - week 1 [P]

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Mamba-3