Researchers have developed a new method called Generic TB-Coverage for pruning sparsely activated Mixture-of-Experts (MoE) language models. This technique addresses the challenge of removing redundant experts without requiring specific downstream calibration data. By utilizing generic text corpora like WikiText2 and C4, Generic TB-Coverage profiles per-expert utility separately on each corpus and ensures that high-utility experts from each are retained. This approach has shown improvements in average accuracy and reduced perplexity degradation on models such as Qwen1.5-MoE-A2.7B and DeepSeek-MoE-16B-Base, particularly under aggressive pruning scenarios. AI
IMPACT This method could enable more efficient deployment of large MoE models by reducing their size without significant performance loss.
RANK_REASON The cluster contains a research paper detailing a new method for pruning language models. [lever_c_demoted from research: ic=1 ai=1.0]
- C4 model
- DeepSeek-MoE-16B-Base
- ExpertSparsity
- Generic TB-Coverage
- mixture of experts
- Qwen1.5-MoE-A2.7B
- Reap
- WikiText2
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →