Brief · PulseAugur

RESEARCH · Hugging Face Daily Papers English(EN) · 4d · [3 sources]

dots.tts Technical Report

Researchers have introduced dots.tts, a 2 billion parameter text-to-speech model that operates in a continuous latent space. The model incorporates several innovations, including an AudioVAE for a structured speech representation, full-history conditioning for improved consistency, and self-corrective post-training for enhanced robustness. Dots.tts achieves state-of-the-art results on benchmarks like Seed-TTS-Eval and offers efficient, low-latency generation through MeanFlow distillation. AI

IMPACT Sets new SOTA on multilingual TTS benchmarks, potentially improving voice cloning and emotional expressiveness in AI applications.

Hugging Face
dots.tts
Seed-TTS-Eval
AudioVAE