How to Score Experts for One-Shot MoE Expert Pruning: A Unified Formulation and Selection Principle
Researchers have developed a unified formulation for one-shot expert pruning in Mixture-of-Experts (MoE) language models. This new approach organizes pruning criteria around routing frequency, gate weighting, and activation strength. The formulation leads to a principle for selecting pruning criteria based on whether the task is task-agnostic or task-specific. Two new task-agnostic criteria, Mean Activation Norm (MAN) and Mean Squared Activation Norm (MSAN), were introduced and demonstrated strong performance across various MoE models and benchmarks. AI
IMPACT This research offers a more systematic approach to optimizing MoE models for deployment, potentially leading to more efficient memory usage and improved performance across various tasks.