New $\phi$-balancing framework improves MoE model training

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-14 20:39

Researchers have introduced a new framework called $\phi$-balancing to improve the training of Mixture-of-Experts (MoE) models. This method aims to achieve better expert utilization by directly targeting population-level balance, unlike previous heuristic approaches. The proposed framework uses convex duality to derive an efficient online algorithm, resulting in stable and effective expert routing with minimal overhead during pretraining and fine-tuning. AI

影响 Introduces a principled method for more stable and effective expert utilization in MoE models, potentially improving scalability and performance.

排序理由 The cluster contains an academic paper detailing a new method for training MoE models.

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

$New $\phi$-balancing framework improves MoE model training$

报道来源 [2]

arXiv stat.ML TIER_1 English(EN) · Lizhang Chen, Jonathan Li, Qi Wang, Runlong Liao, Shuozhe Li, Chen Liang, Ni Lao, Qiang Liu · 2026-05-18 04:00

$\phi$-Balancing for Mixture-of-Experts Training

arXiv:2605.15403v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability. However, existing load-balancing methods are largely heuristic and operate on noisy mini-batch assignment statistics, introduc…
arXiv stat.ML TIER_1 English(EN) · Qiang Liu · 2026-05-14 20:39

$φ$-Balancing for Mixture-of-Experts Training

Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability. However, existing load-balancing methods are largely heuristic and operate on noisy mini-batch assignment statistics, introducing bias relative to population-level objectives. …

报道来源 [2]

$\phi$-Balancing for Mixture-of-Experts Training

$φ$-Balancing for Mixture-of-Experts Training

相关实体

相关话题