Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 1w

Grouter: Decoupling Routing from Representation for Accelerated MoE Training

Researchers have introduced Grouter, a novel method for training Mixture-of-Experts (MoE) models that decouples the routing policy from the expert weights. This approach accelerates convergence and improves training stability by using a fixed router derived from pre-trained MoE models. Grouter also incorporates expert folding and tuning to adapt to different model configurations and data distributions, leading to significant gains in pre-training data utilization and throughput acceleration. AI

IMPACT Accelerates MoE training and improves data utilization, potentially lowering costs for large model development.

Mixture-of-Experts (MoE)
Yuqi Xu
Grouter