SHAPE framework prunes MoE LLMs by modeling expert coalitions

By PulseAugur Editorial · [1 sources] · 2026-06-10 04:00

Researchers have developed a new framework called SHAPE for pruning experts in sparse Mixture-of-Experts (MoE) large language models. Unlike previous methods that evaluated experts independently, SHAPE considers the cooperative nature of MoE inference, where experts work in coalitions. The framework uses a Shapley-style attribution to identify experts crucial for high-utility collaborations, leading to more effective pruning. Experiments on models like Qwen3-30B-A3B, GPT-OSS-20B, and DeepSeek-V2-Lite demonstrated that SHAPE can significantly reduce memory footprint without substantial accuracy loss, even with up to 40% expert pruning. AI

IMPACT Enables more efficient deployment of large MoE models by reducing memory requirements without sacrificing accuracy.

RANK_REASON The cluster contains a research paper detailing a new method for pruning MoE LLMs. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

SHAPE framework prunes MoE LLMs by modeling expert coalitions

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Yuhao Zhang · 2026-06-10 04:00

SHAPE: Coalition-Aware Expert Pruning for Sparse Mixture-of-Experts LLMs

arXiv:2606.09886v1 Announce Type: cross Abstract: Sparse Mixture-of-Experts (MoE) large language models achieve strong quality with low per-token compute, yet their deployment is often limited by the memory wall: the full expert pool must remain resident to support token-dependen…

COVERAGE [1]

SHAPE: Coalition-Aware Expert Pruning for Sparse Mixture-of-Experts LLMs

RELATED ENTITIES

RELATED TOPICS