Researchers have developed a unified formulation for one-shot expert pruning in Mixture-of-Experts (MoE) language models. This new approach organizes pruning criteria around routing frequency, gate weighting, and activation strength. The formulation leads to a principle for selecting pruning criteria based on whether the task is task-agnostic or task-specific. Two new task-agnostic criteria, Mean Activation Norm (MAN) and Mean Squared Activation Norm (MSAN), were introduced and demonstrated strong performance across various MoE models and benchmarks. AI
IMPACT This research offers a more systematic approach to optimizing MoE models for deployment, potentially leading to more efficient memory usage and improved performance across various tasks.
RANK_REASON The cluster contains a research paper published on arXiv detailing a new formulation and selection principle for one-shot MoE expert pruning. [lever_c_demoted from research: ic=1 ai=1.0]
- alphaXiv
- arXiv
- CatalyzeX
- DagsHub
- Hugging Face
- IArxiv
- Mean Activation Norm
- Mean Squared Activation Norm
- mixture of experts
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →