PulseAugur
EN
LIVE 11:55:22

ConMoE framework compresses MoE models without retraining

Researchers have developed ConMoE, a novel framework for compressing Mixture-of-Experts (MoE) language models without requiring retraining. This method consolidates the expert pool by reassigning original expert references to a smaller set of selected prototypes. ConMoE uses calibration-based signals to choose which experts to retain and how to remap calls, preserving the original router interface. Experiments on models like deepseek-moe-16b-base and Qwen3-30B-A3B demonstrate that ConMoE achieves competitive or superior compression rates compared to existing pruning and merging techniques. AI

IMPACT This research offers a method to reduce the memory footprint of MoE models, potentially making them more accessible for deployment.

RANK_REASON This is a research paper detailing a new method for compressing MoE models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

ConMoE framework compresses MoE models without retraining

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Yilun Yao, Jiaming Pan, Elsie Dai, Peizhuang Cong, Yaoming Li, Tong Yang ·

    ConMoE: Expert-Pool Consolidation via Prototype Reassignment for MoE Compression

    arXiv:2605.29350v1 Announce Type: new Abstract: Mixture-of-Experts (MoE) language models reduce per-token computation but still require storing and serving all experts, making deployment memory-intensive. Existing post-training compression methods mainly shrink this cost by pruni…