PulseAugur / Brief
EN
LIVE 11:18:02

Brief

last 24h
[2/2] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. End-to-End Training for Discrete Token LLM based TTS System

    Researchers have developed a novel end-to-end training framework for discrete token Large Language Model (LLM) based Text-to-Speech (TTS) systems. This approach unifies the training of the speech tokenizer, LLM, a flow-matching model, and a reward model, unlike previous cascaded systems trained independently. The joint optimization encourages the discrete speech token space to better capture acoustic and semantic information, leading to improved TTS generation. Experiments show this end-to-end method achieves state-of-the-art results on the Seed-TTS-Eval benchmark with a significantly smaller LLM. AI

    IMPACT This unified training approach could lead to more efficient and higher-quality speech synthesis models.

  2. dots.tts Technical Report

    Researchers have introduced dots.tts, a 2 billion parameter text-to-speech model that operates in a continuous latent space. The model incorporates several innovations, including an AudioVAE for a structured speech representation, full-history conditioning for improved consistency, and self-corrective post-training for enhanced robustness. Dots.tts achieves state-of-the-art results on benchmarks like Seed-TTS-Eval and offers efficient, low-latency generation through MeanFlow distillation. AI

    IMPACT Sets new SOTA on multilingual TTS benchmarks, potentially improving voice cloning and emotional expressiveness in AI applications.