English(EN) Generic Expert Coverage for Pruning SparseMixture-of-Experts Language Models

新方法使用通用文本语料库剪枝MoE语言模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-03 04:00

研究人员开发了一种名为Generic TB-Coverage的新方法，用于剪枝稀疏激活的专家混合（MoE）语言模型。该技术解决了在无需特定下游校准数据的情况下移除冗余专家的挑战。通过利用WikiText2和C4等通用文本语料库，Generic TB-Coverage分别在每个语料库上分析每个专家的效用，并确保保留每个语料库中的高效用专家。这种方法在Qwen1.5-MoE-A2.7B和DeepSeek-MoE-16B-Base等模型上，特别是在激进剪枝场景下，显示出平均准确率的提高和困惑度下降的改善。 AI

影响该方法通过在不显著损失性能的情况下减小模型尺寸，有望实现更高效的大型MoE模型的部署。

排序理由该集群包含一篇研究论文，详细介绍了剪枝语言模型的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Yongqin Zeng, Sicheng Pan, Jiale Wang, Hai-tao Zheng, Hong-Gee Kim, Chunxia Ma, XiuTeng Zhou · 2026-07-03 04:00

Generic Expert Coverage for Pruning SparseMixture-of-Experts Language Models

arXiv:2607.01710v1 Announce Type: new Abstract: Sparsely activated Mixture-of-Experts (MoE) language models contain substantial structured redundancy among routed experts, but pruning them without downstream calibration data remains challenging. Existing expert-pruning methods ty…

报道来源 [1]

Generic Expert Coverage for Pruning SparseMixture-of-Experts Language Models

相关实体

相关话题