DecagonAI has significantly reduced the cost of its voice agent by nearly sixfold by migrating from closed models to fine-tuned open-source models hosted on Together AI. This transition maintained low latency for real-time voice interactions, achieving under 400ms p95 model latency per turn. The optimization involved custom speculators, prompt caching, and deployment on NVIDIA Blackwell hardware, enabling frequent model updates. AI
IMPACT Demonstrates significant cost efficiencies and performance gains achievable by migrating from proprietary to fine-tuned open-source models for specialized applications.
RANK_REASON This is a case study of a company using AI infrastructure to improve its product, not a release of a new frontier model or core research.
Read on X — Together (inference / OSS) →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →