Researchers have developed SARA, a new framework to improve the performance of Mixture-of-Experts (MoE) models in low-resource languages. SARA addresses the issue where tokens from low-resource languages are often routed to different experts than those used for high-resource languages, hindering cross-lingual knowledge transfer. By using a Jensen-Shannon divergence constraint, SARA aligns the internal routing distributions of MoE layers, effectively transferring specialized capabilities from high-resource languages to low-resource ones. Experiments show SARA enhances performance on benchmarks like Global-MMLU for models such as Qwen3-30B-A3B and Phi-3.5-MoE-instruct. AI
IMPACT Enhances multilingual capabilities in sparse AI architectures, potentially improving accessibility and performance for low-resource languages.
RANK_REASON Academic paper detailing a new framework for improving MoE models. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →