English(EN) DOT-MoE: Differentiable Optimal Transport for MoEfication

DOT-MoE框架将密集模型转换为稀疏MoE

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-02 04:00

研究人员推出了一种新框架DOT-MoE，可将密集大型语言模型转换为稀疏专家混合（MoE）架构。该方法将密集层的分解构建为可微分最优传输问题，使用可微分Sinkhorn-Knopp迭代来管理专家容量，并使用Straight-Through Estimators进行神经元到专家的分配和token路由的端到端学习。实验表明，DOT-MoE的性能优于现有方法，在保持密集模型90%性能的同时，将激活参数减半。 AI

影响通过将密集架构转换为稀疏MoE，实现了更高效的大型语言模型推理。

排序理由这是一篇详细介绍模型架构转换新方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Udbhav Bamba, Arnav Chavan, Aryamaan Thakur, Steve Teig, Deepak Gupta · 2026-06-02 04:00

DOT-MoE: Differentiable Optimal Transport for MoEfication

arXiv:2606.01666v1 Announce Type: cross Abstract: The scaling of Large Language Models (LLMs) has driven significant performance gains but created substantial challenges in inference efficiency. While Mixture of Experts (MoEs) architectures address this by decoupling model size f…

报道来源 [1]

DOT-MoE: Differentiable Optimal Transport for MoEfication

相关实体

相关话题