ZONOS2: real-time TTS with 8B params, 900M active, and high-fidelity voice cloning
Zyphra has released ZONOS2, an open-source, real-time text-to-speech model featuring 8 billion total parameters and 900 million active parameters for efficient inference. This sparse Mixture-of-Experts model excels at high-fidelity, zero-shot voice cloning and aims to overcome the typical trade-off between speech quality and speed. ZONOS2 processes raw UTF-8 bytes instead of phonemes, improving support for multiple languages and code-switching, and was trained on over 6 million hours of audio data. AI
IMPACT This sparse MoE TTS model offers high-fidelity voice cloning and real-time performance, potentially setting new benchmarks for expressive speech synthesis.