Researchers have introduced OrScale, a novel optimization technique designed to enhance neural network training. OrScale builds upon the Muon method by incorporating layer-wise trust-ratio scaling, which measures the Frobenius norm of the actual parameter-space direction applied. This approach, detailed in a new paper, aims to improve upon existing methods like Muon and AdamW, particularly for language models. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a new optimization technique that shows empirical improvements on benchmarks, potentially enhancing model training efficiency.
RANK_REASON The cluster contains a new academic paper detailing a novel research method. [lever_c_demoted from research: ic=1 ai=1.0]