PulseAugur
实时 20:36:33
English(EN) PreMoE: Proactive Inference for Efficient Mixture-of-Experts

新的MoE架构提升效率和性能

研究人员正在开发先进技术来改进专家混合(MoE)模型,特别关注解决领域转换和推理效率方面的挑战。一种受自由能原理和脉冲神经网络启发的方法,引入了时间记忆和预期路由,以显著增强领域转移期间的专家选择。其他研究则侧重于通过运行时感知调度框架和新颖的内核配置来优化MoE推理,以最大化吞吐量。此外,还在探索新的方法来管理异构专家大小并在微调过程中保留较少使用的专家的知识,旨在提高性能和资源利用率。 AI

影响 新方法有望带来更高效、更鲁棒的MoE模型,可能降低推理成本并提高跨不同任务的性能。

排序理由 多篇arXiv论文详细介绍了关于专家混合(MoE)架构和优化的新研究。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 17 个来源。 我们如何撰写摘要 →

新的MoE架构提升效率和性能

报道来源 [17]

  1. arXiv cs.LG TIER_1 English(EN) · Minbin Huang, Han Shi, Chuanyang Zheng, Yimeng Wu, Guoxuan Chen, Xintong Yu, Yichun Yin, Hong Cheng ·

    UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

    arXiv:2605.06665v1 Announce Type: new Abstract: Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and …

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

    Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert c…

  3. arXiv cs.AI TIER_1 English(EN) · Hong Cheng ·

    UniPool: A Globally Shared Expert Pool for Mixture-of-Experts

    Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert c…

  4. arXiv cs.LG TIER_1 English(EN) · Omkar B Shende, Marcello Traiola, Gayathri Ananthanarayanan ·

    AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures

    arXiv:2605.04754v1 Announce Type: new Abstract: Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied…

  5. arXiv cs.LG TIER_1 English(EN) · Reza Rastegar ·

    Boundary Mass and the Soft-to-Hard Limit in Mixture-of-Experts

    arXiv:2605.02124v1 Announce Type: new Abstract: Softmax-routed mixture-of-experts models approach hard routing as the temperature tends to zero, but this limit is singular near routing ties. This paper studies that singularity at the population level for squared-loss MoE regressi…

  6. arXiv cs.LG TIER_1 English(EN) · Man Yung Wong (Russell) ·

    Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts

    arXiv:2605.00604v1 Announce Type: new Abstract: Sparse MoE routing fails at domain transitions, where the current token belongs to one distribution and the next to another. In a controlled experiment (4 experts, 5 seeds), standard affinity routing assigns only 0.006 +/- 0.001 pro…

  7. arXiv cs.LG TIER_1 English(EN) · Man Yung Wong ·

    Affinity Is Not Enough: Recovering the Free Energy Principle in Mixture-of-Experts

    Sparse MoE routing fails at domain transitions, where the current token belongs to one distribution and the next to another. In a controlled experiment (4 experts, 5 seeds), standard affinity routing assigns only 0.006 +/- 0.001 probability to the correct expert at the transition…

  8. Hugging Face Daily Papers TIER_1 English(EN) ·

    Prediction-powered Inference by Mixture of Experts

    The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-superv…

  9. arXiv cs.AI TIER_1 English(EN) · Vyom Sharma, Debajyoti Datta ·

    RaMP: Runtime-Aware Megakernel Polymorphism for Mixture-of-Experts

    arXiv:2604.26039v1 Announce Type: cross Abstract: The optimal kernel configuration for Mixture-of-Experts (MoE) inference depends on both batch size and the expert routing distribution, yet production systems dispatch from batch size alone, leaving 10-70% of kernel throughput unr…

  10. arXiv cs.CL TIER_1 English(EN) · Zhicheng Ma, Xiang Liu, Zhaoxiang Liu, Ning Wang, Yi Shen, Kai Wang, Shuming Shi, Shiguo Lian ·

    Mixture of Heterogeneous Grouped Experts for Language Modeling

    arXiv:2604.23108v1 Announce Type: new Abstract: Large Language Models (LLMs) based on Mixture-of-Experts (MoE) are pivotal in industrial applications for their ability to scale performance efficiently. However, standard MoEs enforce uniform expert sizes,creating a rigidity that f…

  11. arXiv cs.LG TIER_1 English(EN) · Abhimanyu Bambhaniya, Geonhwa Jeong, Jason Park, Jiecao Yu, Jaewon Lee, Pengchao Wang, Changkyu Kim, Chunqiang Tang, Tushar Krishna ·

    Scaling Multi-Node Mixture-of-Experts Inference Using Expert Activation Patterns

    arXiv:2604.23150v1 Announce Type: new Abstract: Most recent state-of-the-art (SOTA) large language models (LLMs) use Mixture-of-Experts (MoE) architectures to scale model capacity without proportional per-token compute, enabling higher-quality outputs at manageable serving costs.…

  12. arXiv cs.LG TIER_1 English(EN) · X. Y. Han, Yuan Zhong ·

    A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models

    arXiv:2512.03915v3 Announce Type: replace-cross Abstract: In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minim…

  13. arXiv cs.CL TIER_1 English(EN) · Haoze He, Xingyuan Ding, Xuan Jiang, Xinkai Zou, Alex Cheng, Yibo Zhao, Juncheng Billy Li, Heather Miller ·

    Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning

    arXiv:2604.23036v1 Announce Type: cross Abstract: Despite MoE models leading many benchmarks, supervised fine-tuning (SFT) for the MoE architectures remains difficult because its router layers are fragile. Methods such as DenseMixer and ESFT mitigate router collapse with dense mi…

  14. arXiv cs.LG TIER_1 English(EN) · Zehua Pei, Ying Zhang, Hui-Ling Zhen, Tao Yuan, Xianzhi Yu, Zhenhua Dong, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu ·

    PreMoE: Proactive Inference for Efficient Mixture-of-Experts

    arXiv:2505.17639v3 Announce Type: replace Abstract: Mixture-of-Experts (MoE) models offer dynamic computation, but are typically deployed as static full-capacity models, missing opportunities for deployment-specific specialization. We introduce PreMoE, a training-free framework t…

  15. arXiv stat.ML TIER_1 English(EN) · Yanwu Gu, Linglong Kong, Dong Xia ·

    Prediction-powered Inference by Mixture of Experts

    arXiv:2604.27892v1 Announce Type: new Abstract: The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. Thes…

  16. arXiv stat.ML TIER_1 English(EN) · Dong Xia ·

    Prediction-powered Inference by Mixture of Experts

    The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-superv…

  17. arXiv stat.ML TIER_1 English(EN) · Alessandro Rinaldo ·

    On Bayesian Softmax-Gated Mixture-of-Experts Models

    Mixture-of-experts models provide a flexible framework for learning complex probabilistic input-output relationships by combining multiple expert models through an input-dependent gating mechanism. These models have become increasingly prominent in modern machine learning, yet th…