PulseAugur
EN
LIVE 02:23:49

Coinbase cuts AI spend 50% by optimizing model routing and caching

Coinbase has successfully halved its AI expenditure by implementing a strategic approach to model usage and infrastructure. The company achieved this by defaulting engineers to more cost-effective open-weight models like GLM 5.2 and Kimi 2.7, while still allowing them to opt for more powerful, expensive models when necessary. Key to their success were improvements in caching, task-based routing, and increased visibility into per-engineer token usage, leading to a significant reduction in costs without impacting developer productivity. AI

IMPACT Demonstrates practical strategies for reducing AI operational costs, potentially influencing enterprise adoption of more efficient model routing and caching techniques.

RANK_REASON This item details infrastructure and cost-optimization strategies for AI usage within a specific company, rather than a new model release or fundamental research.

Read on dev.to — LLM tag →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Coinbase cuts AI spend 50% by optimizing model routing and caching

COVERAGE [1]

  1. dev.to — LLM tag TIER_1 English(EN) · Andrew Kew ·

    Coinbase Cut Its AI Spend in Half Without Throttling Engineers - Here's the Playbook

    <p>Coinbase halved its AI spend while token usage kept growing exponentially. CEO Brian Armstrong posted the breakdown on X this week — five concrete levers, no access caps, and 91% of engineers never hit the old usage limits.</p> <p>That last point matters. This isn't a story ab…