Buildkite cuts LLM calls 58% with semantic caching

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-22 13:22

Buildkite has implemented semantic caching in their internal flaky-test summarizer, significantly reducing LLM calls and costs. By using Bifröst, their gateway, to cache summaries based on meaning rather than exact text, they achieved a 58% reduction in calls to providers like anthropic/claude-haiku and openai/gpt-4o-mini. This optimization also improved latency and provided resilience during an 11-minute provider outage, demonstrating caching's dual benefit for cost and reliability. AI

影响 Demonstrates a practical method for reducing LLM operational costs and improving reliability through semantic caching.

排序理由 This is a technical implementation detail about optimizing LLM usage within a specific company's product, not a frontier release or significant industry event.

在 dev.to — LLM tag 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

Buildkite cuts LLM calls 58% with semantic caching

报道来源 [1]

dev.to — LLM tag TIER_1 English(EN) · claire nguyen · 2026-06-22 13:22

Semantic caching our flaky-test summariser: 58% fewer LLM calls

<p><strong>TL;DR: Our internal flaky-test summariser at Buildkite was firing ~40k LLM calls a day, and most were near-duplicates of failures we'd already explained. Switching on semantic caching in Bifrost cut live provider calls by 58% and dropped p50 latency on cache hits from …

报道来源 [1]

Semantic caching our flaky-test summariser: 58% fewer LLM calls

相关实体

相关话题