Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts
Researchers have published a paper analyzing the discontinuities inherent in Sparse Mixture-of-Experts (SMoE) architectures. These discontinuities arise from the Top-k expert selection process, where small input changes can lead to significantly different outputs. The study provides a geometric and stochastic analysis, classifying these discontinuities and estimating their volume. It also models input perturbations using a diffusion process to show that paths are likely to encounter lower-order discontinuities first. Based on these findings, the paper proposes a smoothing mechanism for SMoEs that enhances continuity and empirical performance across language and vision tasks with minimal computational overhead. AI
IMPACT This research could lead to more stable and performant Mixture-of-Experts models by addressing inherent discontinuities.