OpenAI has introduced Prompt Caching for its API, offering developers significant cost and latency reductions. This feature automatically reuses recently processed input tokens, providing a 50% discount for prompts longer than 1,024 tokens. Prompt Caching is now active on the latest GPT-4o models and fine-tuned versions, with caches typically cleared within an hour of inactivity. This aims to help developers scale their AI applications more efficiently by lowering operational expenses. AI
RANK_REASON This is a product feature update for an existing API, not a new model release or major platform shift.
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →