PulseAugur
EN
LIVE 02:51:21

Zyphra releases ZONOS2, an 8B parameter real-time TTS model

Zyphra has released ZONOS2, an open-source, real-time text-to-speech model featuring 8 billion total parameters and 900 million active parameters for efficient inference. This sparse Mixture-of-Experts model excels at high-fidelity, zero-shot voice cloning and aims to overcome the typical trade-off between speech quality and speed. ZONOS2 processes raw UTF-8 bytes instead of phonemes, improving support for multiple languages and code-switching, and was trained on over 6 million hours of audio data. AI

IMPACT This sparse MoE TTS model offers high-fidelity voice cloning and real-time performance, potentially setting new benchmarks for expressive speech synthesis.

RANK_REASON The item describes the release of a new open-source TTS model with specific technical details and benchmark comparisons. [lever_c_demoted from research: ic=1 ai=1.0]

Read on r/LocalLLaMA →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Zyphra releases ZONOS2, an 8B parameter real-time TTS model

COVERAGE [1]

  1. r/LocalLLaMA TIER_1 English(EN) · /u/KokaOP ·

    ZONOS2: real-time TTS with 8B params, 900M active, and high-fidelity voice cloning

    <table> <tr><td> <a href="https://www.reddit.com/r/LocalLLaMA/comments/1u4lk5c/zonos2_realtime_tts_with_8b_params_900m_active/"> <img alt="ZONOS2: real-time TTS with 8B params, 900M active, and high-fidelity voice cloning" src="https://external-preview.redd.it/i4WJfW6p4Uj9gIVSsX0…