Brief

last 24h

[3/3] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.AI English(EN) · 17h

End-to-End Training for Discrete Token LLM based TTS System

Researchers have developed a novel end-to-end training framework for discrete token Large Language Model (LLM) based Text-to-Speech (TTS) systems. This approach unifies the training of the speech tokenizer, LLM, a flow-matching model, and a reward model, unlike previous cascaded systems trained independently. The joint optimization encourages the discrete speech token space to better capture acoustic and semantic information, leading to improved TTS generation. Experiments show this end-to-end method achieves state-of-the-art results on the Seed-TTS-Eval benchmark with a significantly smaller LLM. AI

IMPACT This unified training approach could lead to more efficient and higher-quality speech synthesis models.
- LLM
- Seed-TTS-Eval
TOOL · r/LocalLLaMA English(EN) · 6d

Tongyi labs quietly released chart topping stt & tts models. Anyone know if they'll be open weights?

Tongyi Labs has released new speech-to-text (STT) and text-to-speech (TTS) models that are reportedly topping charts. The models were released without significant fanfare, leading to community questions about whether they will be open-weight releases. This release marks a significant development in speech technology from the lab. AI

IMPACT New STT/TTS models from Tongyi Labs could advance speech technology capabilities.
- Tongyi Labs
RESEARCH · arXiv cs.AI English(EN) · 1w · [3 sources]

Efficient ASR Training with Conversations that Never Happened

Researchers have developed a novel method to enhance Automatic Speech Recognition (ASR) training for low-resource languages by generating synthetic conversational data. This pipeline uses LLMs to create dialogues, maps speaker attributes to TTS voice profiles, and assembles simulated conversations. Evaluations on the Hungarian BEA-Dialogue benchmark showed that this synthetic data significantly improves ASR performance, even outperforming models trained on much larger real datasets. AI

IMPACT Synthetic data generation via LLMs and TTS offers a scalable solution for improving ASR in low-resource languages.

Brief

End-to-End Training for Discrete Token LLM based TTS System

Tongyi labs quietly released chart topping stt & tts models. Anyone know if they'll be open weights?

Efficient ASR Training with Conversations that Never Happened