Researchers have developed JaiTTS-v1.0, a Thai voice cloning text-to-speech model that achieves state-of-the-art results, surpassing human ground truth in short-duration speech generation with a CER of 1.94%. The model, adapted from VoxCPM, can directly process numerals and Thai-English code-switching without explicit text normalization. In human evaluations, JaiTTS-v1.0 outperformed commercial flagship models in a significant majority of pairwise comparisons. Separately, another research effort focused on cross-lingual voice cloning for scientific speech, evaluating models based on OmniVoice and using data augmentation to improve intelligibility while maintaining speaker similarity across Arabic, Chinese, and French. AI
Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →
IMPACT Advances in Thai voice cloning and cross-lingual speech synthesis could enable more natural and accessible communication tools.
RANK_REASON The cluster contains two arXiv papers detailing new speech synthesis and voice cloning models.