PulseAugur
实时 09:00:51
English(EN) Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

论文分析稀疏专家混合模型中的不连续性

研究人员发表了一篇论文,分析了稀疏专家混合(SMoE)架构中固有的不连续性。这些不连续性源于 Top-k 专家选择过程,其中输入的微小变化可能导致输出显著不同。该研究提供了几何和随机分析,对这些不连续性进行了分类并估算了它们的体积。它还使用扩散过程对输入扰动进行建模,以表明路径可能首先遇到低阶不连续性。基于这些发现,该论文提出了一种用于 SMoE 的平滑机制,该机制在计算开销极小的情况下,增强了语言和视觉任务的连续性和经验性能。 AI

影响 这项研究通过解决固有的不连续性,有可能带来更稳定、性能更好的专家混合模型。

排序理由 该集群包含一篇学术论文,详细介绍了稀疏专家混合模型的理论分析和提出的方法。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Tho Tran Huu, Huu-Tuan Nguyen, Thien-Hai Nguyen, Nhat-Tri Ho, Viet-Hoang Tran, Tho Quan, Tan Minh Nguyen ·

    Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

    arXiv:2606.19036v1 Announce Type: new Abstract: Sparse Mixture-of-Experts (SMoE) architectures are now widely deployed in state-of-the-art language and vision models, where conditional routing allows scaling to very large networks. However, this very Top-$k$ expert selection that…

  2. arXiv cs.LG TIER_1 English(EN) · Tan Minh Nguyen ·

    Geometric and Stochastic Analysis of Discontinuities in Sparse Mixture-of-Experts

    Sparse Mixture-of-Experts (SMoE) architectures are now widely deployed in state-of-the-art language and vision models, where conditional routing allows scaling to very large networks. However, this very Top-$k$ expert selection that enables conditional routing also renders the SM…