PulseAugur
实时 07:17:25

OrScale optimization method improves neural network training

Researchers have introduced OrScale, a novel optimization technique designed to enhance neural network training. OrScale builds upon the Muon method by incorporating layer-wise trust-ratio scaling, which measures the Frobenius norm of the actual parameter-space direction applied. This approach, detailed in a new paper, aims to improve upon existing methods like Muon and AdamW, particularly for language models. AI

影响 Introduces a new optimization technique that shows empirical improvements on benchmarks, potentially enhancing model training efficiency.

排序理由 The cluster contains a new academic paper detailing a novel research method. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

OrScale optimization method improves neural network training

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · Yang You ·

    OrScale: Orthogonalised Optimization with Layer-Wise Trust-Ratio Scaling

    Muon improves neural-network training by orthogonalizing matrix-valued updates, but it leaves each layer's update magnitude controlled mostly by a global learning rate. We introduce OrScale, a trust-ratio extension of Muon built on a simple rule: the denominator of a layer-wise r…