PulseAugur
实时 11:54:37

New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

Researchers have introduced a new dimensionless control parameter, E = T*H/(O+B), to predict the health of expert ecologies in Mixture-of-Experts (MoE) models. This parameter, derived from four hyperparameters, can prevent "dead experts" without needing handcrafted load-balancing losses. Experiments across vision and language tasks suggest that an E value of 0.5 or higher is sufficient to maintain a healthy expert ecosystem, offering a unified diagnostic tool for MoE training. AI

影响 Introduces a new diagnostic parameter for MoE training, potentially simplifying the development and stability of large expert models.

排序理由 Academic paper introducing a new control parameter for MoE models.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

New parameter E predicts Mixture-of-Experts model health, preventing dead experts.

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Qingjun Zhang ·

    E = T*H/(O+B): A Dimensionless Control Parameter for Mixture-of-Experts Ecology

    arXiv:2605.06415v1 Announce Type: new Abstract: We introduce E = T*H/(O+B), a dimensionless control parameter that predicts whether Mixture-of-Experts (MoE) models will develop a healthy expert ecology or collapse into dead experts. E combines four hyperparameters -- routing temp…

  2. arXiv cs.CV TIER_1 English(EN) · Qingjun Zhang ·

    E = T*H/(O+B): A Dimensionless Control Parameter for Mixture-of-Experts Ecology

    We introduce E = T*H/(O+B), a dimensionless control parameter that predicts whether Mixture-of-Experts (MoE) models will develop a healthy expert ecology or collapse into dead experts. E combines four hyperparameters -- routing temperature T, routing entropy weight H, oracle weig…