Robustness of Mixtures of Experts to Feature Noise
Two new research papers explore the intricacies of Mixture of Experts (MoE) models. The first paper demonstrates that MoE architectures inherently filter feature noise, leading to improved robustness and efficiency compared to dense networks. The second paper introduces a novel statistical framework for softmax-gated Gaussian MoE models, addressing parameter estimation challenges and proposing a consistent method for selecting the number of experts without extensive model sweeps. AI
IMPACT These papers advance the theoretical understanding of MoE models, potentially leading to more robust and efficient AI systems.