Redesign Mixture-of-Experts Routers with Manifold Power Iteration
Researchers have developed a new method called Manifold Power Iteration (MPI) to redesign the routers in Mixture-of-Experts (MoE) models. This technique aligns each router row with the principal singular direction of its associated expert, aiming to improve how tokens are routed to experts. Theoretical analysis suggests MPI drives router rows towards these principal directions, and empirical tests on MoE models ranging from 1B to 11B parameters show that this alignment leads to more effective models. AI
IMPACT This research could lead to more efficient and effective Mixture-of-Experts models by improving their routing mechanisms.