Anthropic prompt caching, explained: cache_control markers, the two-tier write premium, and when it actually pays off
Anthropic has introduced a prompt caching feature that significantly reduces costs for users by caching the initial, stable portion of a prompt. This feature applies a premium on the first request to store the prompt's encoded state, but subsequent requests within a defined Time-To-Live (TTL) period receive a substantial discount. The system caches the model's internal representation of the prompt's static context, rather than the response itself, leading to potential savings of up to 90% on the cached input tokens. AI
IMPACT Reduces operational costs for developers using Anthropic's models by optimizing prompt processing.