English(EN) Does Mixture-of-Experts Actually Help Inference on Consumer and Edge Hardware? An Empirical Study

MoE模型在消费级和边缘硬件上的推理表现不一

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-24 04:00

一项最新研究调查了混合专家（MoE）语言模型在消费级和边缘硬件上是否提供实际的推理优势。研究发现，虽然MoE模型理论上可以减少每个token的计算量，但这种优势在实践中并非总是能实现。性能因设备而异，MoE模型在笔记本电脑上表现略有劣势，在边缘设备上劣势更明显，消耗更多能量并达到内存限制。 AI

影响 MoE模型在资源受限设备上可能无法提供预期的推理加速，影响部署策略。

排序理由学术论文，详细介绍了模型性能的实证研究。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Alfarizy Alfarizy, Hung Truong Thanh Nguyen, Ren\'e Richard, Roozbeh Razavi-Far, Hung Cao · 2026-06-24 04:00

Does Mixture-of-Experts Actually Help Inference on Consumer and Edge Hardware? An Empirical Study

arXiv:2606.21428v2 Announce Type: replace-cross Abstract: Mixture-of-Experts (MoE) language models are often described as ideal for resource-constrained inference. Each token activates only a small subset of experts, so the per-token compute cost, in floating-point operations (FL…