SMoE paper proposes expert substitution for efficient edge MoE deployment

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-06 04:00

Researchers have developed SMoE, a novel algorithm-system co-design aimed at enabling Mixture of Experts (MoE) models to run on edge devices. This approach tackles memory limitations by dynamically offloading experts and substituting low-importance activated experts with cached, functionally similar ones. The system prioritizes expert cache reuse, significantly reducing decoding latency and PCIe overhead while maintaining accuracy. AI

影响 Enables efficient deployment of large MoE models on resource-constrained edge hardware, potentially broadening AI accessibility.

排序理由 This is a research paper detailing a new algorithm and system design for running MoE models on edge devices. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Guoying Zhu, Meng Li, Haipeng Dai, Xuechen Liu, Weijun Wang, Keran Li, Jun xiao, Ligeng Chen, Wei Wang · 2026-05-06 04:00

SMoE: An Algorithm-System Co-Design for Pushing MoE to the Edge via Expert Substitution

arXiv:2508.18983v3 Announce Type: replace Abstract: The Mixture of Experts (MoE) architecture has emerged as a key technique for scaling Large Language Models by activating only a subset of experts per query. Deploying MoE on consumer-grade edge hardware, however, is constrained …

报道来源 [1]

SMoE: An Algorithm-System Co-Design for Pushing MoE to the Edge via Expert Substitution

相关实体

相关话题