When Model Merging Breaks Routing: Training-Free Calibration for MoE
Researchers have identified a significant issue in merging Mixture-of-Experts (MoE) large language models, termed "routing breakdown." This occurs when the merging process disrupts the MoE router's ability to direct tokens to the correct expert models, leading to performance degradation. To solve this, they propose Hessian-Aware Router Calibration (HARC), a novel training-free method that uses second-order curvature information to recalibrate the router. Experiments demonstrate HARC's effectiveness in improving performance on tasks like mathematical reasoning and code generation. AI
IMPACT This research offers a method to improve the efficiency and performance of merging large language models, particularly MoE architectures, potentially reducing the need for extensive retraining.