PulseAugur
实时 03:13:31
实体 muon

muon

PulseAugur coverage of muon — every cluster mentioning muon across labs, papers, and developer communities, ranked by signal.

Show in brief
总计 · 30天
23
90 天内 23
发布 · 30天
0
90 天内 0
论文 · 30天
22
90 天内 22
层级分布 · 90 天
关系
情绪 · 30 天

9 天有情绪数据

LAB BRAIN
observation resolved confirmed 置信度 0.75

Spectrum preservation is a common theme in new optimizer research

The introduction of Pion, which 'preserves spectrum', and Muown, which addresses 'spectral norm drift', indicates a broader trend in optimizer development. This focus on maintaining spectral properties suggests that current optimizers, including Muon, may suffer from spectral instability that hinders training.

observation resolved contradicted 置信度 0.85

Muon optimizer's spectral norm drift is a key area for improvement

Multiple recent papers (Muown, Pion, and the general mode connectivity research) highlight issues related to spectral norms and Muon. Muown explicitly addresses 'upward drift of spectral norms', while Pion aims to 'preserve spectrum'. This suggests that managing spectral properties is a critical challenge for Muon's stability and performance.

hypothesis active 置信度 0.60

Aurora optimizer may outperform Muown in addressing Muon's neuron death

Tilde Research's Aurora optimizer is specifically designed to fix 'neuron death' in Muon, a problem not explicitly addressed by Muown. While Muown improves spectral norm drift, Aurora's targeted approach to neuron inactivity could lead to more comprehensive performance gains, especially in scenarios where neuron death is a primary bottleneck.

observation resolved confirmed 置信度 0.80

Muon's spectral properties are being actively studied in relation to optimizer behavior and mode connectivity

Multiple recent clusters highlight research into Muon's spectral properties and how they interact with optimization dynamics. The connection between optimizers, spectral norms, and mode connectivity suggests ongoing theoretical and empirical work is exploring fundamental aspects of Muon's behavior.

hypothesis resolved contradicted 置信度 0.70

Muon's neuron death issue may be addressed by new optimizers like Aurora within 3 months

The Tilde Research launch of Aurora specifically targets neuron death in Muon. Given Aurora's public release and demonstrated effectiveness, it's plausible that Muon users will adopt Aurora or similar solutions to mitigate this issue within the next quarter.

查看全部假设 →

最近 · 第 1/2 页 · 共 23 条
  1. RESEARCH · CL_44881 ·

    研究发现,优化器选择极大地改变了 Transformer 的缩放定律

    一篇新的研究论文表明,即使架构保持不变,优化器的选择也会显著影响 Transformer 模型的能力和缩放定律。研究发现,与 AdamW 较弱的缩放相比,Muon 优化器在表示容量方面实现了线性缩放,提高了 2.3 倍,尤其是在具有挑战性的稀有 token 领域。这表明优化器应与架构和数据一起被视为模型缩放的主要因素,并强调了为获得更好性能而共同设计优化器和架构的潜力。

  2. TOOL · CL_40880 ·

    LionMuon optimizer cuts training cost for large models

    Researchers have introduced LionMuon, a novel optimization algorithm designed for efficient training of large-scale models. This method alternates between the low-cost updates of Lion and the stronger, albeit more expen…

  3. RESEARCH · CL_39993 ·

    新的优化器 AMUSE、MiMuon 和 Pion 增强深度学习训练

    研究人员开发了几种新的优化技术来改进深度学习模型的训练。AMUSE 将 Muon 的快速适应性与无计划平均的稳定性相结合,无需学习率计划即可提高视觉和语言任务的性能。另一种方法 MiMuon 通过将其与 SGD 融合来增强 Muon 的泛化能力,提供更低的泛化误差。此外,一种名为 Pion 的新优化器通过采用频谱高通滤波机制,解决了 Muon 在视觉-语言-动作和强化学习中的局限性。

  4. RESEARCH · CL_38176 ·

    Ringmaster LMO 方法改进异步神经网络训练

    研究人员开发了 Ringmaster LMO,一种新颖的异步神经网络训练方法,解决了分布式系统中的效率低下问题。该方法基于延迟阈值概念来管理梯度陈旧性,旨在提高异构环境下的训练速度。该方法专为无约束随机非凸优化设计,并在涉及二次问题和语言模型预训练的实验中,与现有的同步和异步基线相比,表现出卓越的性能。

  5. RESEARCH · CL_29301 ·

    Pion optimizer preserves spectrum for stable LLM training

    Researchers have introduced Pion, a novel spectrum-preserving optimizer designed for training large language models. Unlike traditional additive optimizers like Adam, Pion utilizes orthogonal transformations to update w…

  6. RESEARCH · CL_28033 ·

    Tilde Research launches Aurora optimizer to fix neuron death in Muon

    Tilde Research has introduced Aurora, a novel optimizer designed to train neural networks more effectively. Aurora addresses a critical issue in the popular Muon optimizer where a significant number of neurons become pe…

  7. RESEARCH · CL_28256 ·

    Muown 优化器通过控制行范数漂移来改进 LLM 训练

    研究人员开发了 Muown,这是一种旨在改进大型语言模型训练的新型优化方法。Muown 解决了 Muon 优化器的问题,特别是训练过程中权重矩阵中谱范数的向上漂移。通过将行幅度向量视为显式变量,Muown 提高了各种模型规模下的困惑度和学习率稳定性,性能优于 AdamW 和 Lion 等现有优化器。

  8. TOOL · CL_27538 ·

    New research links optimizers to mode connectivity in neural networks

    Researchers have explored the role of optimizers in mode connectivity within neural networks, a concept previously underexplored. Their work demonstrates that solutions generated by a single optimizer, such as AdamW or …

  9. TOOL · CL_25998 ·

    Muon framework offers new spectral Wasserstein distances for deep learning

    Researchers have introduced a new framework called Muon to stabilize deep-learning optimization using spectral normalizations, particularly for matrix-shaped parameters. This work idealizes the continuous-time, vanishin…

  10. TOOL · CL_27720 ·

    Muon optimizer analysis reveals distinct convergence phases vs. SignSGD

    Researchers have analyzed stochastic spectral optimizers, including Muon, in a high-dimensional matrix-valued least squares problem. Their analysis reveals that SignSVD, which Muon approximates, performs a square-root p…

  11. RESEARCH · CL_24593 ·

    Aurora optimizer boosts neural network training efficiency

    Researchers have introduced Aurora, a new optimizer designed to improve the training of large neural networks, particularly those with rectangular matrices. Aurora addresses issues like neuron death in MLP layers that c…

  12. TOOL · CL_27734 ·

    Muon optimizer fails on convex Lipschitz functions, study finds

    A new paper challenges the theoretical underpinnings of the Muon optimization algorithm, demonstrating that it does not converge on convex Lipschitz functions. The research suggests that Muon's practical success likely …

  13. TOOL · CL_25579 ·

    OrScale optimization method improves neural network training

    Researchers have introduced OrScale, a novel optimization technique designed to enhance neural network training. OrScale builds upon the Muon method by incorporating layer-wise trust-ratio scaling, which measures the Fr…

  14. TOOL · CL_21984 ·

    Pro-KLShampoo optimizer improves LLM pre-training with spectral structure analysis

    Researchers have developed Pro-KLShampoo, an optimization technique that combines gradient preconditioning with orthogonalization for more efficient LLM pre-training. This method leverages the observed spike-and-flat ei…

  15. TOOL · CL_21923 ·

    New LMO-IGT method accelerates optimization with implicit gradient transport

    Researchers have introduced LMO-IGT, a novel class of stochastic optimization methods designed to accelerate convergence in machine learning. This approach leverages implicit gradient transport (IGT) to achieve faster r…

  16. RESEARCH · CL_22113 ·

    New research links optimizer choice to reduced forgetting in LLM finetuning

    Researchers have explored the impact of optimizer consistency during the fine-tuning of large language models. One study suggests that using the same optimizer for both pre-training and fine-tuning leads to less knowled…

  17. RESEARCH · CL_29329 ·

    SignSGD和Muon优化器的性能提升得到理论解释

    研究人员从理论上分析了像SignSGD和Muon这样的基于符号的优化算法为何能在训练大型模型时优于标准SGD。一项新研究表明,SignSGD的优势源于其在特定条件下的有效性,例如稀疏噪声和$\\ell_1$-范数平稳性,而标准SGD在处理这些条件时效率不高。另一篇论文质疑了Muon复杂几何结构的必要性,提出像随机或反向谱等更简单的方法可以通过关注局部对齐和下降潜力来实现类似的性能。

  18. TOOL · CL_18835 ·

    New Polar Express method accelerates matrix decomposition for deep learning

    Researchers have developed a new GPU-friendly algorithm called Polar Express for computing matrix decompositions, which is crucial for the Muon optimizer used in training deep neural networks. This method optimizes for …

  19. RESEARCH · CL_18340 ·

    Nora optimizer achieves efficiency, stability, and speed for large-scale LLM training

    Researchers have introduced Nora, a novel matrix-based optimizer designed for efficient and stable training of large language models. Nora aims to unify efficiency, stability, and speed, addressing limitations of existi…

  20. RESEARCH · CL_14458 ·

    新理论统一了非凸机器学习的自适应优化方法

    研究人员开发了一个统一的框架来分析非凸机器学习中使用的一阶优化算法。该框架涵盖了AdaGrad、AdaNorm以及Shampoo和Muo的变体等流行方法。该分析为这些方法提供了随机收敛率,即使在有动量且不对梯度有界或步长较小的情况下也是如此。