English(EN) PreMoE: Proactive Inference for Efficient Mixture-of-Experts

新的MoE架构提升效率和性能

作者 PulseAugur 编辑部 · [17 个来源] · 2026-04-22 13:37

研究人员正在开发先进技术来改进专家混合（MoE）模型，特别关注解决领域转换和推理效率方面的挑战。一种受自由能原理和脉冲神经网络启发的方法，引入了时间记忆和预期路由，以显著增强领域转移期间的专家选择。其他研究则侧重于通过运行时感知调度框架和新颖的内核配置来优化MoE推理，以最大化吞吐量。此外，还在探索新的方法来管理异构专家大小并在微调过程中保留较少使用的专家的知识，旨在提高性能和资源利用率。 AI

影响新方法有望带来更高效、更鲁棒的MoE模型，可能降低推理成本并提高跨不同任务的性能。

排序理由多篇arXiv论文详细介绍了关于专家混合（MoE）架构和优化的新研究。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 17 个来源。我们如何撰写摘要 →

报道来源 [17]

arXiv cs.LG TIER_1 English(EN) · Minbin Huang, Han Shi, Chuanyang Zheng, Yimeng Wu, Guoxuan Chen, Xintong Yu, Yichun Yin, Hong Cheng · 2026-05-08 04:00

UniPool：一个全球共享的专家混合池

arXiv:2605.06665v1 Announce Type: new Abstract: Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-07 17:59

UniPool：一个全球共享的专家混合池

Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert c…
arXiv cs.AI TIER_1 English(EN) · Hong Cheng · 2026-05-07 17:59

UniPool：一个全球共享的专家混合池

Modern Mixture-of-Experts (MoE) architectures allocate expert capacity through a rigid per-layer rule: each transformer layer owns a separate expert set. This convention couples depth scaling with linear expert-parameter growth and assumes that every layer needs isolated expert c…
arXiv cs.LG TIER_1 English(EN) · Omkar B Shende, Marcello Traiola, Gayathri Ananthanarayanan · 2026-05-07 04:00

AxMoE：量化乘法器对混合专家深度神经网络架构影响的表征

arXiv:2605.04754v1 Announce Type: new Abstract: Deep neural network (DNN) inference at the edge demands simultaneous improvements in accuracy, computational efficiency, and energy consumption. Approximate computing and Mixture-of-Experts (MoE) architectures have each been studied…
arXiv cs.LG TIER_1 English(EN) · Reza Rastegar · 2026-05-05 04:00

Boundary Mass and the Soft-to-Hard Limit in Mixture-of-Experts

arXiv:2605.02124v1 Announce Type: new Abstract: Softmax-routed mixture-of-experts models approach hard routing as the temperature tends to zero, but this limit is singular near routing ties. This paper studies that singularity at the population level for squared-loss MoE regressi…
arXiv cs.LG TIER_1 English(EN) · Man Yung Wong (Russell) · 2026-05-04 04:00

亲和力不足以支撑：在混合专家模型中恢复自由能原理

arXiv:2605.00604v1 Announce Type: new Abstract: Sparse MoE routing fails at domain transitions, where the current token belongs to one distribution and the next to another. In a controlled experiment (4 experts, 5 seeds), standard affinity routing assigns only 0.006 +/- 0.001 pro…
arXiv cs.LG TIER_1 English(EN) · Man Yung Wong · 2026-05-01 12:18

亲和力不足以支撑：在混合专家模型中恢复自由能原理

Sparse MoE routing fails at domain transitions, where the current token belongs to one distribution and the next to another. In a controlled experiment (4 experts, 5 seeds), standard affinity routing assigns only 0.006 +/- 0.001 probability to the correct expert at the transition…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-04-30 14:08

Experts Mixture's Prediction-powered Inference

The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-superv…
arXiv cs.AI TIER_1 English(EN) · Vyom Sharma, Debajyoti Datta · 2026-04-30 04:00

RaMP：面向专家混合的运行时感知巨型内核多态性

arXiv:2604.26039v1 Announce Type: cross Abstract: The optimal kernel configuration for Mixture-of-Experts (MoE) inference depends on both batch size and the expert routing distribution, yet production systems dispatch from batch size alone, leaving 10-70% of kernel throughput unr…
arXiv cs.CL TIER_1 English(EN) · Zhicheng Ma, Xiang Liu, Zhaoxiang Liu, Ning Wang, Yi Shen, Kai Wang, Shuming Shi, Shiguo Lian · 2026-04-28 04:00

用于语言建模的异构分组专家混合模型

arXiv:2604.23108v1 Announce Type: new Abstract: Large Language Models (LLMs) based on Mixture-of-Experts (MoE) are pivotal in industrial applications for their ability to scale performance efficiently. However, standard MoEs enforce uniform expert sizes,creating a rigidity that f…
arXiv cs.LG TIER_1 English(EN) · Abhimanyu Bambhaniya, Geonhwa Jeong, Jason Park, Jiecao Yu, Jaewon Lee, Pengchao Wang, Changkyu Kim, Chunqiang Tang, Tushar Krishna · 2026-04-28 04:00

利用专家激活模式扩展多节点混合专家模型推理

arXiv:2604.23150v1 Announce Type: new Abstract: Most recent state-of-the-art (SOTA) large language models (LLMs) use Mixture-of-Experts (MoE) architectures to scale model capacity without proportional per-token compute, enabling higher-quality outputs at manageable serving costs.…
arXiv cs.LG TIER_1 English(EN) · X. Y. Han, Yuan Zhong · 2026-04-28 04:00

面向大规模AI模型的稀疏混合专家模型的无辅助损失负载均衡的理论框架

arXiv:2512.03915v3 Announce Type: replace-cross Abstract: In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minim…
arXiv cs.CL TIER_1 English(EN) · Haoze He, Xingyuan Ding, Xuan Jiang, Xinkai Zou, Alex Cheng, Yibo Zhao, Juncheng Billy Li, Heather Miller · 2026-04-28 04:00

在混合专家模型微调中保留长尾专家信息

arXiv:2604.23036v1 Announce Type: cross Abstract: Despite MoE models leading many benchmarks, supervised fine-tuning (SFT) for the MoE architectures remains difficult because its router layers are fragile. Methods such as DenseMixer and ESFT mitigate router collapse with dense mi…
arXiv cs.LG TIER_1 English(EN) · Zehua Pei, Ying Zhang, Hui-Ling Zhen, Tao Yuan, Xianzhi Yu, Zhenhua Dong, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu · 2026-04-27 04:00

PreMoE：高效专家混合模型的主动推理

arXiv:2505.17639v3 Announce Type: replace Abstract: Mixture-of-Experts (MoE) models offer dynamic computation, but are typically deployed as static full-capacity models, missing opportunities for deployment-specific specialization. We introduce PreMoE, a training-free framework t…
arXiv stat.ML TIER_1 English(EN) · Yanwu Gu, Linglong Kong, Dong Xia · 2026-05-01 04:00

Prediction-powered Inference by Mixture of Experts

arXiv:2604.27892v1 Announce Type: new Abstract: The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. Thes…
arXiv stat.ML TIER_1 English(EN) · Dong Xia · 2026-04-30 14:08

Prediction-powered Inference by Mixture of Experts

The rapidly expanding artificial intelligence (AI) industry has produced diverse yet powerful prediction tools, each with its own network architecture, training strategy, data-processing pipeline, and domain-specific strengths. These tools create new opportunities for semi-superv…
arXiv stat.ML TIER_1 English(EN) · Alessandro Rinaldo · 2026-04-22 13:37

关于贝叶斯软最大值门控专家混合模型

Mixture-of-experts models provide a flexible framework for learning complex probabilistic input-output relationships by combining multiple expert models through an input-dependent gating mechanism. These models have become increasingly prominent in modern machine learning, yet th…

报道来源 [17]

相关实体

相关话题