MISO-TTS . 8 Billion text2speech model released.
Miso Labs has released Miso-TTS, a text-to-speech model with 8 billion parameters. This model utilizes the Sesame CSM architecture, generating audio codes from text and optional audio context. It is built upon a large Llama 3.2-style backbone and a smaller autoregressive audio decoder, enabling high-quality conversational speech and voice continuation. AI
IMPACT Enables new applications in voice generation and audio content creation.