Brief · PulseAugur

TOOL · r/StableDiffusion English(EN) · 2h

MISO-TTS . 8 Billion text2speech model released.

Miso Labs has released Miso-TTS, a text-to-speech model with 8 billion parameters. This model utilizes the Sesame CSM architecture, generating audio codes from text and optional audio context. It is built upon a large Llama 3.2-style backbone and a smaller autoregressive audio decoder, enabling high-quality conversational speech and voice continuation. AI

IMPACT Enables new applications in voice generation and audio content creation.

Llama 3.2
Miso Labs
Sesame CSM
Miso-TTS