Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 7h

CacheMuon: Using Temporal Preconditioning To Approximate Polar Factor

Researchers have introduced CacheMuon, a novel temporal preconditioning method designed to optimize the computation of polar factors in the Muon optimizer. By leveraging the temporal correlation of these factors across training iterations, CacheMuon reuses previous information to approximate the current polar factor, thereby reducing redundant calculations. This approach offers a controllable trade-off between computational efficiency and model quality, demonstrating significant savings in orthogonalization FLOPs for language model and vision training with minimal degradation in validation quality. AI

IMPACT CacheMuon offers a controllable quality-efficiency frontier for AI training, potentially reducing computational costs for language model and vision tasks.

arXiv
Muon
language model
CacheMuon
Vision training