AssemblyAI benchmarks STT latency, prioritizing accuracy over raw speed

By PulseAugur Editorial · [1 sources] · 2026-06-23 20:50

AssemblyAI has released benchmarks for real-time speech-to-text (STT) latency, emphasizing that the lowest latency does not always equate to the best performance for voice agents. The company argues that "fast enough plus accurate" is superior to "fastest but wrong," as voice agents require a balance between speed and accuracy to avoid misinterpreting crucial information. AssemblyAI highlights key metrics like Time to First Token (TTFT) and Time to Complete Turn (TTCT), stressing the importance of P95 latency for production environments over median (P50) latency. Their Universal-3.5 Pro Realtime model reportedly achieves a competitive 6.99% word error rate on real-world voice agent audio benchmarks. AI

IMPACT Highlights the critical balance between speed and accuracy for voice agents, influencing STT model selection.

RANK_REASON Product benchmark release from a non-frontier lab.

Read on AssemblyAI blog →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

AssemblyAI benchmarks STT latency, prioritizing accuracy over raw speed

COVERAGE [1]

AssemblyAI blog TIER_1 Română(RO) · 2026-06-23 20:50

Real

The lowest latency number on a chart won't win your voice agent eval. Here's what to actually measure — TTFT, TTCT, P95 — and what "fast enough" really means.

COVERAGE [1]

Real

RELATED ENTITIES

RELATED TOPICS