PulseAugur
实时 22:48:31

MoE models misroute tokens on complex reasoning tasks, study finds

Researchers have identified a significant issue in Mixture-of-Experts (MoE) language models where the routing mechanism, which directs tokens to specific experts, often selects suboptimal paths. While the standard router performs well on confident tokens, it fails to identify better-performing routes for complex reasoning tasks. This misrouting is present across several prominent MoE models, including Qwen3, GPT-OSS, DeepSeek-V2, and OLMoE. The study suggests that even a minor update to the router, without altering the experts themselves, can improve performance on challenging math and reasoning benchmarks, indicating that routing efficiency is a key bottleneck. AI

影响 Identifies a critical flaw in MoE routing that hinders reasoning capabilities, suggesting targeted router improvements could boost performance on complex tasks.

排序理由 Academic paper detailing a novel analysis of MoE model routing mechanisms and their impact on performance. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

MoE models misroute tokens on complex reasoning tasks, study finds

报道来源 [1]

  1. arXiv cs.CL TIER_1 English(EN) · Jungseul Ok ·

    When Are Experts Misrouted? Counterfactual Routing Analysis in Mixture-of-Experts Language Models

    Mixture-of-Experts (MoE) language models route each token to a small subset of experts, but whether the routes selected by a trained top-$k$ router are good ones is rarely evaluated directly. Holding the model fixed, we compare each standard route against sampled equal-compute al…