DeMuon: A Decentralized Muon for Matrix Optimization over Graphs
Researchers have introduced DeMuon, a novel decentralized method for matrix optimization over graphs. This approach extends the centralized Muon algorithm by incorporating matrix orthogonalization through Newton-Schulz iterations and utilizing gradient tracking to handle local function heterogeneity. DeMuon achieves iteration complexity comparable to centralized algorithms, even under heavy-tailed noise, and is presented as the first direct extension of Muon to decentralized graph optimization with theoretical guarantees. Preliminary experiments show DeMuon outperforming other decentralized algorithms in transformer pretraining tasks across various network topologies. AI
IMPACT Introduces a new decentralized optimization method that could improve distributed AI training efficiency.