English(EN) VoltanaLLM: Energy-Efficient and SLO-Aware Disaggregated LLM Serving via Adaptive Frequency Control and State-Space Routing

VoltanaLLM 系统将 LLM 推理能耗降低 36%，同时满足 SLO

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-24 04:00

一个名为 VoltanaLLM 的新系统已被开发出来，以解决大型语言模型 (LLM) 推理的巨大能耗问题。该系统在最近的一篇 arXiv 论文中进行了详细介绍，采用了自适应频率控制和状态空间路由来降低 LLM 服务预填充和解码阶段的能耗。通过识别 GPU 频率的最佳工作点并智能路由请求，VoltanaLLM 可以在不影响延迟服务水平目标 (SLO) 的情况下实现显著的节能。 AI

影响有潜力显著降低大规模部署 LLM 的运营成本和环境影响。

排序理由详细介绍 LLM 服务效率新系统的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Jiahuan Yu, Aryan Taneja, Junfeng Lin, Minjia Zhang · 2026-06-24 04:00

VoltanaLLM: Energy-Efficient and SLO-Aware Disaggregated LLM Serving via Adaptive Frequency Control and State-Space Routing

arXiv:2509.04827v3 Announce Type: replace-cross Abstract: The energy cost of Large Language Model (LLM) inference is rapidly becoming a barrier to sustainable and scalable deployment. Although modern serving architectures expose distinct prefill and decode behaviors, existing sys…

报道来源 [1]

VoltanaLLM: Energy-Efficient and SLO-Aware Disaggregated LLM Serving via Adaptive Frequency Control and State-Space Routing

相关实体

相关话题