English(EN) IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

IntentKV 剪枝 LLM 代理 KV 缓存，减少 77% 的 token 使用量

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-10 04:00

研究人员开发了 IntentKV，一种用于剪枝大型语言模型代理中 KV 缓存的新颖方法，以提高推理效率。该技术维护跨轮意图的会话级记忆，使其能够对 token 进行评分并选择性地删除，而不会显著损失准确性。IntentKV 已证明在减少峰值请求 token 和 KV 读取方面取得了显著成效，尤其是在长时程代理任务中，同时保持基础 LLM 不变。 AI

影响减小 LLM 代理的 KV 缓存大小，可能降低推理成本并支持更长的上下文窗口。

排序理由该集群包含一篇详细介绍 LLM 推理新优化方法的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Junjie Li, Jiong Lou, Jie Li · 2026-06-10 04:00

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

arXiv:2606.09916v1 Announce Type: cross Abstract: Multi-turn LLM agents fan short queries into long trajectories of tool calls, search results, and intermediate reasoning. Both KV memory and KV read bandwidth grow by orders of magnitude across a single trajectory, making the key-…

报道来源 [1]

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

相关实体

相关话题