PulseAugur
实时 09:31:25
English(EN) Muown: Row-Norm Control for Muon Optimization

Muown 优化器通过控制行范数漂移来改进 LLM 训练

研究人员开发了 Muown,这是一种旨在改进大型语言模型训练的新型优化方法。Muown 解决了 Muon 优化器的问题,特别是训练过程中权重矩阵中谱范数的向上漂移。通过将行幅度向量视为显式变量,Muown 提高了各种模型规模下的困惑度和学习率稳定性,性能优于 AdamWLion 等现有优化器。 AI

影响 提高 LLM 训练效率和稳定性,可能支持更大模型和更快的开发周期。

排序理由 该集群包含一篇详细介绍语言模型训练新优化方法的学术论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Muown 优化器通过控制行范数漂移来改进 LLM 训练

报道来源 [2]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Muown: Row-Norm Control for Muon Optimization

    Muon has emerged as a strong competitor to AdamW for language model pre-training, yet its behavior at scale is sensitive to weight decay. Recent work has observed that, for Muon without decoupled weight decay, the spectral norm of weight matrices drifts upward over training. Thro…

  2. arXiv cs.LG TIER_1 English(EN) · Niao He ·

    Muown: Row-Norm Control for Muon Optimization

    Muon has emerged as a strong competitor to AdamW for language model pre-training, yet its behavior at scale is sensitive to weight decay. Recent work has observed that, for Muon without decoupled weight decay, the spectral norm of weight matrices drifts upward over training. Thro…