Google DeepMind has introduced Decoupled DiLoCo, a novel approach to training advanced AI models that enhances resilience and flexibility across data centers. This system can train models like Google's 12B Gemma model across geographically dispersed regions using low-bandwidth networks and can even mix different generations of hardware, such as TPU6e and TPUv5p. Decoupled DiLoCo is designed to be self-healing, isolating and continuing training through artificial hardware failures and reintegrating units when they come back online, addressing the synchronization issues that typically stall AI training. AI
影响 Enables more robust and flexible large-scale AI model training, potentially reducing costs and increasing accessibility.
排序理由 Introduces a new method for training AI models with a focus on resilience and distributed computing.
AI 生成摘要 · Google Gemini · 来自 6 个来源。 我们如何撰写摘要 →