Brief

last 24h

[2/2] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · r/MachineLearning English(EN) · 15h

𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬 [R]

Researchers have introduced Delta Attention Residuals, a novel upgrade to residual connections in neural networks that improves cross-layer routing. This method routes over the deltas of hidden states, rather than the cumulative states themselves, which helps prevent routing collapse in deep layers. The technique has demonstrated consistent gains in perplexity across various model sizes and can be applied via drop-in fine-tuning of pretrained models with minimal parameter overhead. AI

IMPACT This architectural improvement could lead to more efficient and performant large language models.
RESEARCH · Hugging Face Daily Papers English(EN) · 1w · [12 sources]

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

Researchers have developed several new optimization techniques to improve deep learning model training. AMUSE combines the rapid adaptation of Muon with the stability of Schedule-Free averaging, eliminating the need for learning rate schedules and improving performance across vision and language tasks. Another approach, MiMuon, enhances the generalization capabilities of Muon by blending it with SGD, offering a lower generalization error. Additionally, a new optimizer called Pion addresses Muon's limitations in vision-language-action and reinforcement learning by employing a spectral high-pass filtering mechanism. AI

IMPACT These new optimizers aim to improve training efficiency and generalization for large models, potentially accelerating development in areas like LLMs and robotics.
- Muon optimizer
- MiMuon
- YOLO26m
- Qwen3-0.6B
- Schedule-Free
- AMUSE
- Qwen3
- SGD
- Muon
- AdamW

Brief

𝐃𝐞𝐥𝐭𝐚 𝐀𝐭𝐭𝐞𝐧𝐭𝐢𝐨𝐧 𝐑𝐞𝐬𝐢𝐝𝐮𝐚𝐥𝐬 [R]

Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR