MuLoCo: Muon is a practical inner optimizer for DiLoCo
Researchers have introduced MuLoCo, a new framework designed to optimize the training of large language models (LLMs) within the DiLoCo system. MuLoCo addresses performance degradation observed in DiLoCo as the number of workers increases by focusing on the inner optimizer's role. Experiments show that MuLoCo, utilizing the Muon optimizer, yields improved pseudogradient quality and superior model training performance across various scales compared to standard DiLoCo and data-parallel methods. AI
IMPACT Introduces a novel optimization technique that could improve efficiency and scalability for large language model training.