PulseAugur
EN
LIVE 19:16:05

DecagonAI cuts voice agent costs 6x with Together AI and open models

DecagonAI has significantly reduced the cost of its voice agent by nearly sixfold by migrating from closed models to fine-tuned open-source models hosted on Together AI. This transition maintained low latency for real-time voice interactions, achieving under 400ms p95 model latency per turn. The optimization involved custom speculators, prompt caching, and deployment on NVIDIA Blackwell hardware, enabling frequent model updates. AI

IMPACT Demonstrates significant cost efficiencies and performance gains achievable by migrating from proprietary to fine-tuned open-source models for specialized applications.

RANK_REASON This is a case study of a company using AI infrastructure to improve its product, not a release of a new frontier model or core research.

Read on X — Together (inference / OSS) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. X — Together (inference / OSS) TIER_1 English(EN) · togethercompute ·

    .@DecagonAI cut voice agent cost per turn nearly 6x with Together AI.

    .@DecagonAI cut voice agent cost per turn nearly 6x with Together AI. They moved from closed models to fine-tuned open models, while keeping latency low enough for real-time voice: → <400ms p95 model latency per turn → custom speculators and prompt caching → optimized