New 2B-parameter TTS model dots.tts achieves SOTA

By PulseAugur Editorial · [3 sources] · 2026-06-05 00:00

Researchers have introduced dots.tts, a 2 billion parameter text-to-speech model that operates in a continuous latent space. The model incorporates several innovations, including an AudioVAE for a structured speech representation, full-history conditioning for improved consistency, and self-corrective post-training for enhanced robustness. Dots.tts achieves state-of-the-art results on benchmarks like Seed-TTS-Eval and offers efficient, low-latency generation through MeanFlow distillation. AI

IMPACT Sets new SOTA on multilingual TTS benchmarks, potentially improving voice cloning and emotional expressiveness in AI applications.

RANK_REASON The cluster contains a technical report detailing a new text-to-speech model with performance benchmarks.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New 2B-parameter TTS model dots.tts achieves SOTA

COVERAGE [3]

arXiv cs.AI TIER_1 English(EN) · Shi Lian, Changtao Li, Bohan Li, Hankun Wang, Da Zheng, Junfeng Tian, Yufeng Ma, Colin Zhang, Kai Yu · 2026-06-08 04:00

dots.tts Technical Report

arXiv:2606.07080v1 Announce Type: cross Abstract: We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that models speech in a continuous latent space. Compared with existing continuous autoregressive models, our key innovations are …
arXiv cs.AI TIER_1 English(EN) · Kai Yu · 2026-06-05 09:19

dots.tts Technical Report

We present dots.tts, a 2B-parameter continuous autoregressive text-to-speech (TTS) foundation model that models speech in a continuous latent space. Compared with existing continuous autoregressive models, our key innovations are threefold. First, we train an AudioVAE with multip…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-05 00:00

dots.tts Technical Report

A 2B-parameter continuous autoregressive text-to-speech model trained on a multilingual corpus achieves state-of-the-art performance on multiple benchmarks while enabling efficient low-latency speech generation through specialized distillation techniques.

COVERAGE [3]

dots.tts Technical Report

dots.tts Technical Report

dots.tts Technical Report

RELATED ENTITIES

RELATED TOPICS