Miso Labs releases 8B parameter text-to-speech model

By PulseAugur Editorial · [1 sources] · 2026-06-02 17:03

Miso Labs has released Miso-TTS, a text-to-speech model with 8 billion parameters. This model utilizes the Sesame CSM architecture, generating audio codes from text and optional audio context. It is built upon a large Llama 3.2-style backbone and a smaller autoregressive audio decoder, enabling high-quality conversational speech and voice continuation. AI

IMPACT Enables new applications in voice generation and audio content creation.

RANK_REASON Release of a new open-source text-to-speech model. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/StableDiffusion →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Miso Labs releases 8B parameter text-to-speech model

COVERAGE [1]

r/StableDiffusion TIER_2 English(EN) · /u/AgeNo5351 · 2026-06-02 17:03

MISO-TTS . 8 Billion text2speech model released.

<table> <tr><td> <a href="https://www.reddit.com/r/StableDiffusion/comments/1tux5qx/misotts_8_billion_text2speech_model_released/"> <img alt="MISO-TTS . 8 Billion text2speech model released." src="https://external-preview.redd.it/M3MzM3hrcTNpdzRoMY7GS_to5cylywf9TcdHvB0d0MGuCTgk2t…

COVERAGE [1]

MISO-TTS . 8 Billion text2speech model released.

RELATED ENTITIES

RELATED TOPICS