Researchers have introduced ZONOS2 8B, a new text-to-speech (TTS) model that significantly advances naturalness, prosody, and voice cloning capabilities. The model scales up to 8 billion parameters using a mixture-of-experts (MoE) architecture, which enhances inference speed and throughput. Its training corpus has been expanded to over 6 million hours, and simplified post-training processes further improve its performance on quality and speaker similarity metrics. ZONOS2 8B demonstrates competitive results on various benchmarks, including its own novel TTS benchmark, ZTTS1-Eval, while maintaining efficient streaming latency. The model weights and inference code are publicly available under an Apache 2.0 license. AI
IMPACT This release offers a state-of-the-art TTS model with improved naturalness and voice cloning, potentially impacting applications requiring high-fidelity synthetic speech.
RANK_REASON The cluster describes a new model release with a technical report and publicly available weights, fitting the research category. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →