ZONOS2 8B TTS model released with 8B parameters and 6M hours of training data

By PulseAugur Editorial · [1 sources] · 2026-06-24 04:00

Researchers have introduced ZONOS2 8B, a new text-to-speech (TTS) model that significantly advances naturalness, prosody, and voice cloning capabilities. The model scales up to 8 billion parameters using a mixture-of-experts (MoE) architecture, which enhances inference speed and throughput. Its training corpus has been expanded to over 6 million hours, and simplified post-training processes further improve its performance on quality and speaker similarity metrics. ZONOS2 8B demonstrates competitive results on various benchmarks, including its own novel TTS benchmark, ZTTS1-Eval, while maintaining efficient streaming latency. The model weights and inference code are publicly available under an Apache 2.0 license. AI

IMPACT This release offers a state-of-the-art TTS model with improved naturalness and voice cloning, potentially impacting applications requiring high-fidelity synthetic speech.

RANK_REASON The cluster describes a new model release with a technical report and publicly available weights, fitting the research category. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

ZONOS2 8B TTS model released with 8B parameters and 6M hours of training data

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Gabriel Clark, Sofian Mejjoute, Mohamed Osman, George Close, Beren Millidge · 2026-06-24 04:00

ZONOS2 Technical Report

arXiv:2606.24320v1 Announce Type: cross Abstract: We present ZONOS2 8B, our latest TTS model, which achieves state-of-the-art naturalness, prosody, and voice cloning fidelity. We improve upon Zonos-v0.1 across scale, data, and training recipe. We scale the model from 1.6B to 8B t…

COVERAGE [1]

ZONOS2 Technical Report

RELATED ENTITIES

RELATED TOPICS