OpenAI has introduced Prompt Caching for its API, offering developers significant cost and latency reductions. This feature automatically reuses recently processed input tokens, providing a 50% discount for prompts longer than 1,024 tokens. Prompt Caching is now active on the latest GPT-4o models and fine-tuned versions, with caches typically cleared within an hour of inactivity. This aims to help developers scale their AI applications more efficiently by lowering operational expenses. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON This is a product feature update for an existing API, not a new model release or major platform shift.