Researchers have introduced a new framework called $\phi$-balancing to improve the training of Mixture-of-Experts (MoE) models. This method aims to achieve better expert utilization by directly targeting population-level balance, unlike previous heuristic approaches. The proposed framework uses convex duality to derive an efficient online algorithm, resulting in stable and effective expert routing with minimal overhead during pretraining and fine-tuning. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Introduces a principled method for more stable and effective expert utilization in MoE models, potentially improving scalability and performance.
RANK_REASON The cluster contains an academic paper detailing a new method for training MoE models.