Miso Labs has released MisoTTS, an open-weights 8-billion-parameter text-to-speech model that generates expressive speech by conditioning on both text and audio context. The model utilizes residual vector quantization (RVQ) to expand its sonic range without increasing parameter count, inspired by the Sesame CSM architecture. MisoTTS boasts significantly lower latency compared to competitors like ElevenLabs and Sesame, though its capabilities and claims require further third-party validation. AI
IMPACT Sets a new benchmark for low-latency, context-aware TTS, potentially impacting voice cloning and AI assistant development.
RANK_REASON Model release from a lab with open weights and novel architecture. [lever_c_demoted from frontier_release: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →