dots.tts Technical Report
Researchers have introduced dots.tts, a 2 billion parameter text-to-speech model that operates in a continuous latent space. The model incorporates several innovations, including an AudioVAE for a structured speech representation, full-history conditioning for improved consistency, and self-corrective post-training for enhanced robustness. Dots.tts achieves state-of-the-art results on benchmarks like Seed-TTS-Eval and offers efficient, low-latency generation through MeanFlow distillation. AI
IMPACT Sets new SOTA on multilingual TTS benchmarks, potentially improving voice cloning and emotional expressiveness in AI applications.