PulseAugur
EN
LIVE 12:47:54

New AngularMuown optimizer improves Transformer pre-training

Researchers have introduced AngularMuown, a novel optimization algorithm that implicitly performs angular step-size decay, building upon the principles of matrix-aware optimizers like Muon and Muown. This new method explicitly optimizes normalized directions and uses a schedulable angular multiplier, decoupling it from radial magnitude updates. Preliminary results show AngularMuown outperforming its predecessor, Muown, and currently leading the modded nanoGPT speedrunning competition. Experiments on Qwen2 models indicate the algorithm scales effectively to larger parameter counts. AI

IMPACT Introduces a novel optimization technique that could accelerate Transformer model training and improve performance.

RANK_REASON The cluster contains a research paper detailing a new optimization algorithm for machine learning models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New AngularMuown optimizer improves Transformer pre-training

COVERAGE [1]

  1. arXiv cs.LG TIER_1 English(EN) · Niao He ·

    Muown Implicitly Performs Angular Step-size Decay

    Matrix-aware optimizers such as Muon and Muown have recently shown strong empirical performance for pre-training Transformers. In particular, Muown separates each weight matrix into row magnitudes and an un-normalized direction variable, updating the former with Adam and the latt…