Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

Researchers have developed IntentKV, a novel method for pruning KV caches in large language model agents to improve inference efficiency. This technique maintains a session-level memory of cross-turn intent, allowing it to score and selectively drop tokens without significant accuracy loss. IntentKV has demonstrated substantial reductions in peak request tokens and KV reads, particularly for long-horizon agent tasks, while keeping the base LLM unchanged. AI

IMPACT Reduces KV cache size for LLM agents, potentially lowering inference costs and enabling longer context windows.

Qwen3-8B
Qwen2.5-14B
IntentKV