CacheMuon: Using Temporal Preconditioning To Approximate Polar Factor
Researchers have introduced CacheMuon, a novel temporal preconditioning method designed to optimize the computation of polar factors in the Muon optimizer. By leveraging the temporal correlation of these factors across training iterations, CacheMuon reuses previous information to approximate the current polar factor, thereby reducing redundant calculations. This approach offers a controllable trade-off between computational efficiency and model quality, demonstrating significant savings in orthogonalization FLOPs for language model and vision training with minimal degradation in validation quality. AI
IMPACT CacheMuon offers a controllable quality-efficiency frontier for AI training, potentially reducing computational costs for language model and vision tasks.