Brief

last 24h

[3/3] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · r/MachineLearning English(EN) · 2d

PapersWithCode new features - week 1 [P]

Hugging Face has launched new features for PapersWithCode, a platform tracking AI state-of-the-art. The updates include support for multiple metrics on leaderboards, such as for Automatic Speech Recognition and Object Detection. The platform now also accommodates external papers beyond arXiv, automatically enriching them with relevant tags and data, and displays paper lineage to show follow-ups or predecessors. AI

IMPACT Enhances AI research tracking and sharing capabilities for the community.
RESEARCH · arXiv cs.AI English(EN) · 4d · [4 sources]

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

NVIDIA has introduced Gated DeltaNet-2, a new linear attention layer designed to improve memory editing in recurrent neural networks. This model separates the processes of erasing old information and writing new information using distinct channel-wise gates, addressing a limitation in previous delta-rule architectures. Trained on 100 billion tokens with 1.3 billion parameters, Gated DeltaNet-2 demonstrates superior performance over existing models like Mamba-2 and KDA, particularly in long-context retrieval tasks. AI

IMPACT Enhances long-context processing in recurrent models, potentially improving performance on complex language tasks.
SIGNIFICANT · Together AI blog (SW) · 2mo

Mamba-3

Together AI has released Mamba-3, a new state space model (SSM) prioritizing inference efficiency over training speed. This model features a more expressive recurrence formula, complex-valued state tracking, and a multi-input, multi-output (MIMO) variant that enhances accuracy without sacrificing decoding speed. Mamba-3 SISO has demonstrated superior performance in prefill and decode latency compared to previous Mamba versions and even the Llama-3.2-1B Transformer model at the 1.5B parameter scale. The team has also open-sourced the model's kernels, developed collaboratively with researchers from Carnegie Mellon University, Princeton University, and Cartesia AI. AI

IMPACT Sets a new benchmark for inference efficiency in state space models, potentially influencing future LLM architectures and deployment strategies.

Brief

PapersWithCode new features - week 1 [P]

Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention

Mamba-3