Researchers have introduced a new framework called $\phi$-balancing to improve the training of Mixture-of-Experts (MoE) models. This method aims to achieve better expert utilization by directly targeting population-level balance, unlike previous heuristic approaches. The proposed framework uses convex duality to derive an efficient online algorithm, resulting in stable and effective expert routing with minimal overhead during pretraining and fine-tuning. AI
影响 Introduces a principled method for more stable and effective expert utilization in MoE models, potentially improving scalability and performance.
排序理由 The cluster contains an academic paper detailing a new method for training MoE models.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →