English(EN) ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression

ConMoE框架无需重新训练即可压缩MoE模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-29 04:00

研究人员开发了ConMoE，一种无需重新训练即可压缩混合专家（MoE）语言模型的新颖框架。该方法通过将原始专家引用重新分配给一组较小的选定原型来合并专家池。ConMoE使用基于校准的信号来选择要保留的专家以及如何重新映射调用，从而保留了原始路由器接口。在deepseek-moe-16b-base和Qwen3-30B-A3B等模型上进行的实验表明，与现有的剪枝和合并技术相比，ConMoE实现了具有竞争力或更优越的压缩率。 AI

影响这项研究提供了一种减少MoE模型内存占用的方法，可能使其更容易部署。

排序理由这是一篇详细介绍MoE模型压缩新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Yilun Yao, Jiaming Pan, Elsie Dai, Peizhuang Cong, Yaoming Li, Tong Yang · 2026-05-29 04:00

ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression

arXiv:2605.29350v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) language models reduce per-token computation but still require storing and serving all experts, making deployment memory-intensive. Existing post-training compression methods mainly shrink this cost by pruni…

报道来源 [1]

ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression

相关实体

相关话题