muon
PulseAugur coverage of muon — every cluster mentioning muon across labs, papers, and developer communities, ranked by signal.
13 day(s) with sentiment data
Aurora optimizer may outperform Muown in addressing Muon's neuron death
Tilde Research's Aurora optimizer is specifically designed to fix 'neuron death' in Muon, a problem not explicitly addressed by Muown. While Muown improves spectral norm drift, Aurora's targeted approach to neuron inactivity could lead to more comprehensive performance gains, especially in scenarios where neuron death is a primary bottleneck.
Muon optimizer's spectral norm drift is a key area for improvement
Multiple recent papers (Muown, Pion, and the general mode connectivity research) highlight issues related to spectral norms and Muon. Muown explicitly addresses 'upward drift of spectral norms', while Pion aims to 'preserve spectrum'. This suggests that managing spectral properties is a critical challenge for Muon's stability and performance.
Spectrum preservation is a common theme in new optimizer research
The introduction of Pion, which 'preserves spectrum', and Muown, which addresses 'spectral norm drift', indicates a broader trend in optimizer development. This focus on maintaining spectral properties suggests that current optimizers, including Muon, may suffer from spectral instability that hinders training.
Muon's spectral properties are being actively studied in relation to optimizer behavior and mode connectivity
Multiple recent clusters highlight research into Muon's spectral properties and how they interact with optimization dynamics. The connection between optimizers, spectral norms, and mode connectivity suggests ongoing theoretical and empirical work is exploring fundamental aspects of Muon's behavior.
Muon's neuron death issue may be addressed by new optimizers like Aurora within 3 months
The Tilde Research launch of Aurora specifically targets neuron death in Muon. Given Aurora's public release and demonstrated effectiveness, it's plausible that Muon users will adopt Aurora or similar solutions to mitigate this issue within the next quarter.
-
New optimizers DMuon and HiMuon boost AI training efficiency · 6 sources tracked
Researchers have developed two new optimization techniques, DMuon and Hierarchical Muon (HiMuon), to improve the efficiency of matrix-orthogonalization-based optimizers like Muon. DMuon integrates into existing training…
-
New MD Decoupling method improves neural network training
Researchers have introduced a novel technique called Magnitude--Direction (MD) Decoupling to enhance neural network training. This method separates the magnitude and direction of weight vectors, allowing them to be upda…
-
Open problem: AdamW optimizer's effectiveness under heavy-tailed noise in LLMs
A recent paper poses an open problem regarding the effectiveness of the AdamW optimizer in training large language models (LLMs) under heavy-tailed noise conditions. While AdamW is widely used, its theoretical understan…
-
New AngularMuown optimizer improves Transformer pre-training
Researchers have introduced AngularMuown, a novel optimization algorithm that implicitly performs angular step-size decay, building upon the principles of matrix-aware optimizers like Muon and Muown. This new method exp…
-
New Schattor optimization methods unify SGD and Muon for deep learning
Researchers have introduced Schattor, a new family of adaptive optimization methods for deep learning that utilize Schatten norms. This framework unifies existing methods like SGD and Muon, addressing challenges posed b…
-
CacheMuon optimizes AI training by reusing temporal preconditioning data
Researchers have introduced CacheMuon, a novel temporal preconditioning method designed to optimize the computation of polar factors in the Muon optimizer. By leveraging the temporal correlation of these factors across …
-
Research paper details optimal Schatten-p norm usage in deep learning
A new research paper explores the optimal use of Schatten-p norms in deep learning, particularly in relation to optimizers like Muon. The study demonstrates that the effectiveness of these norms is dependent on the spec…
-
Poolside releases Laguna M.1, a 225B MoE model for agentic coding
Poolside has released Laguna M.1, a 225 billion parameter Mixture-of-Experts model optimized for agentic coding tasks. The model features a large sparse MoE architecture with 256 experts and global attention, enabling i…
-
New Muon^p Optimizer Enhances Fine-Tuning of Large Models
Researchers have introduced Muon$^p$, an optimization technique that refines the existing Muon optimizer by using fractional spectral-power updates. This method interpolates between full spectral flattening and standard…
-
New Theory Explains Muon Optimization Success in LLMs
A new research paper provides a theoretical framework for understanding the success of non-Euclidean optimization methods like Muon and Scion in training Transformer models. The study focuses on the heavy-tailed non-con…
-
Zeta Optimizer Improves Neural Network Training with Dual Whitening
Researchers have introduced Zeta, a novel dual whitening optimizer designed to improve large-scale neural network training. Zeta addresses the scale heterogeneity in momentum matrices, a vulnerability in existing matrix…
-
LoRA-Muon: New Optimizer Boosts Deep Learning Fine-Tuning Efficiency
Researchers have introduced LoRA-Muon, an optimization technique designed to improve the efficiency and effectiveness of Low-Rank Adaptation (LoRA) for deep learning models. This new method applies spectral steepest-des…
-
New optimization techniques emerge for faster, more efficient AI model training · 8 sources tracked
Several recent arXiv papers explore advancements in optimization techniques for machine learning. Researchers have proposed new methods like Weight Adaptation ASNG (WA-ASNG) to improve parallel performance in evolutiona…
-
Feedback Alignment training method improved with new dimensionality techniques
Researchers have identified a key limitation in Feedback Alignment (FA), a method for training neural networks that bypasses the biological implausibility of backpropagation. They found that FA's error signals have a lo…
-
New FOGO optimizer tackles AI model forgetting
Researchers have introduced FOGO, a novel optimizer designed to combat forgetting during AI model training. FOGO addresses both short-term forgetting at each training step and long-term forgetting common in continual le…
-
Muon^2 optimizer boosts foundation model training efficiency
Researchers have developed Muon$^2$, an enhanced version of the Muon optimizer designed for large-scale foundation model pre-training. Muon$^2$ improves efficiency and quality by incorporating Adam-style adaptive second…
-
New 'Muon' optimization technique flattens matrix gradients
A new research paper introduces "Muon," an optimization technique that replaces matrix gradients with their polar factors. This method maintains singular directions but flattens the update spectrum, which the authors su…
-
New DoPr optimization boosts AI test-time performance
Researchers have introduced a new optimization technique called Double Preconditioning (DoPr) designed to improve the performance of deep learning models in test-time feedback (TTF) scenarios. This method combines gradi…
-
Muon optimizer shows training efficiency gains over Adam
A new research paper explores the performance advantages of the Muon optimizer over Adam in large language model training. The study, titled "Why Muon Outperforms Adam: A Curvature Perspective," suggests Muon achieves g…
-
MuLoCo framework enhances LLM training with Muon optimizer
Researchers have introduced MuLoCo, a new framework designed to optimize the training of large language models (LLMs) within the DiLoCo system. MuLoCo addresses performance degradation observed in DiLoCo as the number o…