English(EN) When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models

研究发现 MoE 模型在复杂推理任务上错误路由 token

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-08 05:26

研究人员发现混合专家（MoE）语言模型中存在一个重大问题，即路由机制（将 token 指向特定专家）经常选择次优路径。虽然标准路由器在置信度高的 token 上表现良好，但在复杂推理任务上却无法识别出性能更好的路径。这种错误路由存在于包括 Qwen3、GPT-OSS、DeepSeek-V2 和 OLMoE 在内的几款主流 MoE 模型中。研究表明，即使对路由器进行微小的更新，而不改变专家本身，也能提高在具有挑战性的数学和推理基准测试上的性能，这表明路由效率是关键瓶颈。 AI

影响识别出 MoE 路由中的一个关键缺陷，该缺陷阻碍了推理能力，并表明有针对性的路由器改进可以提高复杂任务的性能。

排序理由学术论文，详细介绍了对 MoE 模型路由机制及其对性能影响的创新分析。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Jungseul Ok · 2026-05-08 05:26

专家何时会被错误路由？混合专家语言模型中的反事实路由分析

Mixture-of-Experts (MoE) language models route each token to a small subset of experts, but whether the routes selected by a trained top-$k$ router are good ones is rarely evaluated directly. Holding the model fixed, we compare each standard route against sampled equal-compute al…

报道来源 [1]

专家何时会被错误路由？混合专家语言模型中的反事实路由分析

相关实体

相关话题