Miso Labs has released Miso-TTS, a text-to-speech model with 8 billion parameters. This model utilizes the Sesame CSM architecture, generating audio codes from text and optional audio context. It is built upon a large Llama 3.2-style backbone and a smaller autoregressive audio decoder, enabling high-quality conversational speech and voice continuation. AI
IMPACT Enables new applications in voice generation and audio content creation.
RANK_REASON Release of a new open-source text-to-speech model. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →