研究分析离散推理，揭示 GPU 饱和时的无政府状态成本

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-11 00:00

一篇新的研究论文分析了离散推理架构，该架构将预填充和解码阶段分离到不同的 GPU 池上。该研究首次对这种设置进行了正式的博弈论分析，将其建模为涉及资源分配、缓存和请求路由的耦合博弈。研究确定了 GPU 饱和如何影响“无政府状态成本”（PoA），表明由于延迟和缓存外部性，PoA 在饱和时会显著增加。基于此，设计了一个自适应控制器来优化路由参数并改善操作点，展示了 PoA 的大幅下降，而吞吐量成本仅有轻微增加。 AI

影响这项研究为优化推理的 GPU 资源分配提供了见解，有望带来更高效、更具成本效益的 AI 部署。

排序理由学术论文发布在 arXiv 上，详细介绍了离散推理架构的新分析和控制器。[lever_c_demoted from research: ic=1 ai=1.0]

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Athos Georgiou (NCA) · 2026-06-17 04:00

The Price of Anarchy in Disaggregated Inference

arXiv:2606.17081v1 Announce Type: cross Abstract: Disaggregated inference architectures physically separate prefill and decode phases onto distinct GPU pools, creating competing "agents" that share a fixed hardware budget. We provide, to our knowledge, the first formal game-theor…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-11 00:00

The Price of Anarchy in Disaggregated Inference

Disaggregated inference architectures separate prefill and decode phases across distinct GPU pools, and a game-theoretic analysis characterizes how GPU saturation affects system performance through regime transitions and payoff structure changes, enabling an adaptive controller t…

报道来源 [2]

The Price of Anarchy in Disaggregated Inference

The Price of Anarchy in Disaggregated Inference

相关实体

相关话题