PulseAugur
EN
LIVE 23:31:07

OpenAI API introduces Prompt Caching for cost and latency savings

OpenAI has introduced Prompt Caching for its API, offering developers significant cost and latency reductions. This feature automatically reuses recently processed input tokens, providing a 50% discount for prompts longer than 1,024 tokens. Prompt Caching is now active on the latest GPT-4o models and fine-tuned versions, with caches typically cleared within an hour of inactivity. This aims to help developers scale their AI applications more efficiently by lowering operational expenses. AI

RANK_REASON This is a product feature update for an existing API, not a new model release or major platform shift.

Read on OpenAI News →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

OpenAI API introduces Prompt Caching for cost and latency savings

COVERAGE [1]

  1. OpenAI News TIER_1 English(EN) ·

    Prompt Caching in the API

    Offering automatic discounts on inputs that the model has recently seen