Zyphra has released ZONOS2, an open-source, real-time text-to-speech model featuring 8 billion total parameters and 900 million active parameters for efficient inference. This sparse Mixture-of-Experts model excels at high-fidelity, zero-shot voice cloning and aims to overcome the typical trade-off between speech quality and speed. ZONOS2 processes raw UTF-8 bytes instead of phonemes, improving support for multiple languages and code-switching, and was trained on over 6 million hours of audio data. AI
IMPACT This sparse MoE TTS model offers high-fidelity voice cloning and real-time performance, potentially setting new benchmarks for expressive speech synthesis.
RANK_REASON The item describes the release of a new open-source TTS model with specific technical details and benchmark comparisons. [lever_c_demoted from research: ic=1 ai=1.0]
- Apache 2.0
- Cartesia Sonic 3.5
- ElevenLabs V3
- Fish S2 Pro
- Gemini 3.1 Flash
- Inworld TTS 2
- Qwen 3 TTS 1.7B
- VoxCPM 2
- ZONOS2
- Zyphra
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →