PulseAugur
实时 22:38:58

New $\phi$-balancing framework improves MoE model training

Researchers have introduced a new framework called $\phi$-balancing to improve the training of Mixture-of-Experts (MoE) models. This method aims to achieve better expert utilization by directly targeting population-level balance, unlike previous heuristic approaches. The proposed framework uses convex duality to derive an efficient online algorithm, resulting in stable and effective expert routing with minimal overhead during pretraining and fine-tuning. AI

影响 Introduces a principled method for more stable and effective expert utilization in MoE models, potentially improving scalability and performance.

排序理由 The cluster contains an academic paper detailing a new method for training MoE models.

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

New $\phi$-balancing framework improves MoE model training

报道来源 [2]

  1. arXiv stat.ML TIER_1 English(EN) · Lizhang Chen, Jonathan Li, Qi Wang, Runlong Liao, Shuozhe Li, Chen Liang, Ni Lao, Qiang Liu ·

    $\phi$-Balancing for Mixture-of-Experts Training

    arXiv:2605.15403v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability. However, existing load-balancing methods are largely heuristic and operate on noisy mini-batch assignment statistics, introduc…

  2. arXiv stat.ML TIER_1 English(EN) · Qiang Liu ·

    $φ$-Balancing for Mixture-of-Experts Training

    Mixture-of-Experts (MoE) models rely on balanced expert utilization to fully realize their scalability. However, existing load-balancing methods are largely heuristic and operate on noisy mini-batch assignment statistics, introducing bias relative to population-level objectives. …