PulseAugur
LIVE 14:41:16
research · [4 sources] ·
0
research

New Thai voice cloning model outperforms human ground truth and commercial flagships

Researchers have developed JaiTTS-v1.0, a Thai voice cloning text-to-speech model that achieves state-of-the-art results, surpassing human ground truth in short-duration speech generation with a CER of 1.94%. The model, adapted from VoxCPM, can directly process numerals and Thai-English code-switching without explicit text normalization. In human evaluations, JaiTTS-v1.0 outperformed commercial flagship models in a significant majority of pairwise comparisons. Separately, another research effort focused on cross-lingual voice cloning for scientific speech, evaluating models based on OmniVoice and using data augmentation to improve intelligibility while maintaining speaker similarity across Arabic, Chinese, and French. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT Advances in Thai voice cloning and cross-lingual speech synthesis could enable more natural and accessible communication tools.

RANK_REASON The cluster contains two arXiv papers detailing new speech synthesis and voice cloning models.

Read on arXiv cs.CL →

COVERAGE [4]

  1. arXiv cs.CL TIER_1 · Jullajak Karnjanaekarin, Pontakorn Trakuekul, Narongkorn Panitsrisit, Sumana Sumanakul, Vichayuth Nitayasomboon, Nithid Guntasin, Thanavin Denkavin, Attapol T. Rutherford ·

    JaiTTS: A Thai Voice Cloning Model

    arXiv:2604.27607v1 Announce Type: new Abstract: We present JaiTTS-v1.0, a state-of-the-art Thai voice cloning text-to-speech model built through continual training on a large Thai-centric speech corpus. The model architecture is adapted from VoxCPM, a tokenizer-free autoregressiv…

  2. arXiv cs.CL TIER_1 · Attapol T. Rutherford ·

    JaiTTS: A Thai Voice Cloning Model

    We present JaiTTS-v1.0, a state-of-the-art Thai voice cloning text-to-speech model built through continual training on a large Thai-centric speech corpus. The model architecture is adapted from VoxCPM, a tokenizer-free autoregressive TTS model. JaiTTS-v1.0 directly processes nume…

  3. arXiv cs.CL TIER_1 · Amanuel Gizachew Abebe, Yasmin Moslem ·

    One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

    arXiv:2604.26136v1 Announce Type: cross Abstract: Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scientific communication. In this pap…

  4. arXiv cs.CL TIER_1 · Yasmin Moslem ·

    One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

    Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scientific communication. In this paper, we address this challenge through our system s…