Brief · PulseAugur

TOOL · r/LocalLLaMA English(EN) · 2d

ZONOS2: real-time TTS with 8B params, 900M active, and high-fidelity voice cloning

Zyphra has released ZONOS2, an open-source, real-time text-to-speech model featuring 8 billion total parameters and 900 million active parameters for efficient inference. This sparse Mixture-of-Experts model excels at high-fidelity, zero-shot voice cloning and aims to overcome the typical trade-off between speech quality and speed. ZONOS2 processes raw UTF-8 bytes instead of phonemes, improving support for multiple languages and code-switching, and was trained on over 6 million hours of audio data. AI

IMPACT This sparse MoE TTS model offers high-fidelity voice cloning and real-time performance, potentially setting new benchmarks for expressive speech synthesis.

Apache 2.0
ElevenLabs V3
Zyphra
Gemini 3.1 Flash
VoxCPM 2
ZONOS2
Qwen 3 TTS 1.7B
Cartesia Sonic 3.5
Fish S2 Pro
Inworld TTS 2