SoftMoE 为 LLM 中的混合专家模型引入了可微分路由

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 14:05

研究人员推出了一种用于大型语言模型（LLM）的混合专家（MoE）架构的新方法 SoftMoE。与使用离散 top-k 路由的传统稀疏 MoE 模型不同，SoftMoE 采用了一种软的、可微分的路由机制。这允许基于梯度的专家分配优化，使模型能够更有效地学习如何在层之间分配计算能力。该方法实现了与现有稀疏 MoE 模型相当或更好的性能，同时使用的专家更少，并且后期层表现出更多的专家激活。 AI

影响为 MoE 架构引入了一种更高效、更易于优化的路由机制，有可能提高 LLM 的性能和资源利用率。

排序理由该集群包含一篇详细介绍 LLM 新架构方法的学术论文。 [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Marcin Kurdziel · 2026-06-16 14:05

SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs

Sparse Mixture-of-Experts (MoE) architectures enable scaling LLM parameters under a fixed inference budget by activating only a small subset of experts via top-$k$ routing. While this preserves causality and suits autoregressive language models, the discrete top-$k$ operator is n…

报道来源 [1]

SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs

相关实体

相关话题