PulseAugur / Brief
EN
LIVE 12:12:14

Brief

last 24h
[1/1] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

    Researchers have developed IntentKV, a novel method for pruning KV caches in large language model agents to improve inference efficiency. This technique maintains a session-level memory of cross-turn intent, allowing it to score and selectively drop tokens without significant accuracy loss. IntentKV has demonstrated substantial reductions in peak request tokens and KV reads, particularly for long-horizon agent tasks, while keeping the base LLM unchanged. AI

    IMPACT Reduces KV cache size for LLM agents, potentially lowering inference costs and enabling longer context windows.