We Measured LLM Prompt Caching in Production — Same Prompt, 0% to 91% Hit Rates
A study on LLM prompt caching in production revealed significant variations in hit rates across different models and providers, ranging from 0% to 91%. The research highlighted the importance of a specific `cache_control` marker for certain models like Gemini 3.1 Flash Lite, which otherwise showed no caching benefits. Additionally, the minimum prompt length required for caching to engage was found to be crucial, with shorter prompts failing to utilize the feature. AI
IMPACT Optimizing LLM infrastructure can significantly reduce costs and latency, improving user experience and operational efficiency.