English(EN)PreMoE: Proactive Inference for Efficient Mixture-of-Experts
新的MoE架构提升效率和性能
作者PulseAugur 编辑部·[17 个来源]·
研究人员正在开发先进技术来改进专家混合(MoE)模型,特别关注解决领域转换和推理效率方面的挑战。一种受自由能原理和脉冲神经网络启发的方法,引入了时间记忆和预期路由,以显著增强领域转移期间的专家选择。其他研究则侧重于通过运行时感知调度框架和新颖的内核配置来优化MoE推理,以最大化吞吐量。此外,还在探索新的方法来管理异构专家大小并在微调过程中保留较少使用的专家的知识,旨在提高性能和资源利用率。
AI
arXiv:2605.06665v1 Announce Type: new Abstract: Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and …
Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert c…
Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert c…
arXiv cs.LG
TIER_1English(EN)·Omkar B Shende, Marcello Traiola, Gayathri Ananthanarayanan·
arXiv:2605.04754v1 Announce Type: new Abstract: Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied…
arXiv:2605.02124v1 Announce Type: new Abstract: Softmax-routed mixture-of-experts models approach hard routing as the temperature tends to zero, but this limit is singular near routing ties. This paper studies that singularity at the population level for squared-loss MoE regressi…
arXiv cs.LG
TIER_1English(EN)·Man Yung Wong (Russell)·
arXiv:2605.00604v1 Announce Type: new Abstract: Sparse MoE routing fails at domain transitions, where the current token belongs to one distribution and the next to another. In a controlled experiment (4 experts, 5 seeds), standard affinity routing assigns only 0.006 +/- 0.001 pro…
Sparse MoE routing fails at domain transitions, where the current token belongs to one distribution and the next to another. In a controlled experiment (4 experts, 5 seeds), standard affinity routing assigns only 0.006 +/- 0.001 probability to the correct expert at the transition…
The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-superv…
arXiv:2604.26039v1 Announce Type: cross Abstract: The optimal kernel configuration for Mixture-of-Experts (MoE) inference depends on both batch size and the expert routing distribution, yet production systems dispatch from batch size alone, leaving 10-70% of kernel throughput unr…
arXiv cs.CL
TIER_1English(EN)·Zhicheng Ma, Xiang Liu, Zhaoxiang Liu, Ning Wang, Yi Shen, Kai Wang, Shuming Shi, Shiguo Lian·
arXiv:2604.23108v1 Announce Type: new Abstract: Large Language Models (LLMs) based on Mixture-of-Experts (MoE) are pivotal in industrial applications for their ability to scale performance efficiently. However, standard MoEs enforce uniform expert sizes,creating a rigidity that f…
arXiv cs.LG
TIER_1English(EN)·Abhimanyu Bambhaniya, Geonhwa Jeong, Jason Park, Jiecao Yu, Jaewon Lee, Pengchao Wang, Changkyu Kim, Chunqiang Tang, Tushar Krishna·
arXiv:2604.23150v1 Announce Type: new Abstract: Most recent state-of-the-art (SOTA) large language models (LLMs) use Mixture-of-Experts (MoE) architectures to scale model capacity without proportional per-token compute, enabling higher-quality outputs at manageable serving costs.…
arXiv cs.LG
TIER_1English(EN)·X. Y. Han, Yuan Zhong·
arXiv:2512.03915v3 Announce Type: replace-cross Abstract: In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minim…
arXiv cs.CL
TIER_1English(EN)·Haoze He, Xingyuan Ding, Xuan Jiang, Xinkai Zou, Alex Cheng, Yibo Zhao, Juncheng Billy Li, Heather Miller·
arXiv:2604.23036v1 Announce Type: cross Abstract: Despite MoE models leading many benchmarks, supervised fine-tuning (SFT) for the MoE architectures remains difficult because its router layers are fragile. Methods such as DenseMixer and ESFT mitigate router collapse with dense mi…
arXiv:2505.17639v3 Announce Type: replace Abstract: Mixture-of-Experts (MoE) models offer dynamic computation, but are typically deployed as static full-capacity models, missing opportunities for deployment-specific specialization. We introduce PreMoE, a training-free framework t…
arXiv:2604.27892v1 Announce Type: new Abstract: The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. Thes…
The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-superv…
Mixture-of-experts models provide a flexible framework for learning complex probabilistic input-output relationships by combining multiple expert models through an input-dependent gating mechanism. These models have become increasingly prominent in modern machine learning, yet th…