Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 7h

Post-Trained MoE Can Skip Half Experts via Self-Distillation

Researchers have developed a new framework called Zero-Expert Self-Distillation Adaptation (ZEDA) to make Mixture-of-Experts (MoE) language models more efficient. ZEDA allows post-trained static MoE models to dynamically skip over half of their experts during inference with minimal accuracy loss. This method was tested on Qwen3-30B-A3B and GLM-4.7-Flash, showing significant reductions in computation and an inference speedup of approximately 1.20x. AI

IMPACT Reduces inference costs for MoE models, potentially accelerating deployment and adoption.

GLM-4.7-Flash
Qwen3-30B-A3B
Mixture-of-Experts (MoE)
Zero-Expert Self-Distillation Adaptation (ZEDA)
Xingtai Lv