Researchers have developed SMoE, a novel algorithm-system co-design aimed at enabling Mixture of Experts (MoE) models to run on edge devices. This approach tackles memory limitations by dynamically offloading experts and substituting low-importance activated experts with cached, functionally similar ones. The system prioritizes expert cache reuse, significantly reducing decoding latency and PCIe overhead while maintaining accuracy. AI
影响 Enables efficient deployment of large MoE models on resource-constrained edge hardware, potentially broadening AI accessibility.
排序理由 This is a research paper detailing a new algorithm and system design for running MoE models on edge devices. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →