English(EN) A theoretical model for task routing in mixture-of-expert transformers

新理论解释MoE Transformer中的任务-专家专业化

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-12 12:35

研究人员开发了一个理论模型，使用离散语言表示来解释混合专家（MoE）Transformer模型中的任务-专家专业化。这项工作通过展示单层MoE Transformer如何通过特定任务的专家来编码知识，解决了现有连续模型的局限性。该模型表明，查询被路由到其大小由任务内在复杂性决定的专家，为MoE架构中观察到的局部知识电路提供了理论支持。 AI

影响为MoE架构提供了理论基础，可能指导未来的模型开发和优化。

排序理由该集群包含一篇详细介绍MoE Transformer理论模型的学术论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Yongli Xiang, Vinoth Nandakumar, Yunzhi Yao, Peike Li, Tongliang Liu · 2026-06-15 04:00

A theoretical model for task routing in mixture-of-expert transformers

arXiv:2606.14398v1 Announce Type: new Abstract: Mixture-of-experts (MoE) layers enable the scaling of transformer models while keeping the inference compute fixed. While task-expert specialization has been observed in empirical studies of frontier MoE transformer models, existing…
arXiv cs.LG TIER_1 English(EN) · Tongliang Liu · 2026-06-12 12:35

面向混合专家Transformer的任务路由理论模型

Mixture-of-experts (MoE) layers enable the scaling of transformer models while keeping the inference compute fixed. While task-expert specialization has been observed in empirical studies of frontier MoE transformer models, existing theoretical work analyzes this using continuous…

报道来源 [2]

A theoretical model for task routing in mixture-of-expert transformers

面向混合专家Transformer的任务路由理论模型

相关实体

相关话题