Can I Buy Your KV Cache?
Researchers propose a novel method to reduce AI agent computation by precomputing and selling Key-Value (KV) caches for documents. This approach aims to eliminate redundant prefill computations, which are the most compute-intensive steps for large models. By allowing agents to load precomputed KV caches, the system can save significant computational resources, potentially reducing costs by up to 50x for popular documents. The proposed solution involves hosting these caches on a provider-side content delivery network (CDN) to avoid high egress costs. AI
IMPACT Could significantly reduce inference costs for AI agents by eliminating redundant computations.