Mistral AI has released Voxtral TTS, an open-weights text-to-speech model that rivals ElevenLabs in performance while being significantly more efficient. This 4B parameter model supports nine languages and utilizes a novel architecture combining auto-regressive semantic token generation with flow-matching for acoustic tokens. The release underscores Mistral's commitment to open research and expanding the frontier of multimodal AI capabilities. AI
Summary written by None from 2 sources. How we write summaries →
RANK_REASON Release of an open-weights TTS model with novel architecture details discussed in a podcast and accompanying paper.