English(EN) Learning to Evict from Key-Value Cache

新的强化学习框架优化LLM KV缓存以实现高效推理

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-29 04:00

研究人员开发了一个名为KV Policy (KVP) 的新颖框架，通过优化键值（KV）缓存来解决大型语言模型（LLM）的内存需求。KVP将KV缓存驱逐重构为一个强化学习问题，训练轻量级代理来预测未来解码的token有用性。该方法在长上下文和多轮对话基准测试中显著优于现有的启发式方法，并展示了在不改变底层LLM的情况下泛化到新任务和更长序列长度的能力。 AI

影响这项研究为LLM推理提供了一种更有效的方法，有望降低计算成本并提高长上下文任务的性能。

排序理由学术论文，详细介绍了LLM推理优化的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Luca Moschella, Laura Manduchi, Ozan Sener · 2026-06-29 04:00

Learning to Evict from Key-Value Cache

arXiv:2602.10238v2 Announce Type: replace Abstract: The growing size of Large Language Models (LLMs) makes efficient inference challenging, primarily due to the memory demands of the autoregressive Key-Value (KV) cache. Existing eviction or compression methods reduce cost but rel…

报道来源 [1]

Learning to Evict from Key-Value Cache

相关实体

相关话题