New Thai voice cloning model outperforms human ground truth and commercial flagships

By PulseAugur Editorial · [4 sources] · 2026-04-28 21:47

Researchers have developed JaiTTS-v1.0, a Thai voice cloning text-to-speech model that achieves state-of-the-art results, surpassing human ground truth in short-duration speech generation with a CER of 1.94%. The model, adapted from VoxCPM, can directly process numerals and Thai-English code-switching without explicit text normalization. In human evaluations, JaiTTS-v1.0 outperformed commercial flagship models in a significant majority of pairwise comparisons. Separately, another research effort focused on cross-lingual voice cloning for scientific speech, evaluating models based on OmniVoice and using data augmentation to improve intelligibility while maintaining speaker similarity across Arabic, Chinese, and French. AI

IMPACT Advances in Thai voice cloning and cross-lingual speech synthesis could enable more natural and accessible communication tools.

RANK_REASON The cluster contains two arXiv papers detailing new speech synthesis and voice cloning models.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

New Thai voice cloning model outperforms human ground truth and commercial flagships

COVERAGE [4]

arXiv cs.CL TIER_1 English(EN) · Jullajak Karnjanaekarin, Pontakorn Trakuekul, Narongkorn Panitsrisit, Sumana Sumanakul, Vichayuth Nitayasomboon, Nithid Guntasin, Thanavin Denkavin, Attapol T. Rutherford · 2026-05-01 04:00

JaiTTS: A Thai Voice Cloning Model

arXiv:2604.27607v1 Announce Type: new Abstract: We present JaiTTS-v1.0, a state-of-the-art Thai voice cloning text-to-speech model built through continual training on a large Thai-centric speech corpus. The model architecture is adapted from VoxCPM, a tokenizer-free autoregressiv…
arXiv cs.CL TIER_1 English(EN) · Attapol T. Rutherford · 2026-04-30 08:59

JaiTTS: A Thai Voice Cloning Model

We present JaiTTS-v1.0, a state-of-the-art Thai voice cloning text-to-speech model built through continual training on a large Thai-centric speech corpus. The model architecture is adapted from VoxCPM, a tokenizer-free autoregressive TTS model. JaiTTS-v1.0 directly processes nume…
arXiv cs.CL TIER_1 English(EN) · Amanuel Gizachew Abebe, Yasmin Moslem · 2026-04-30 04:00

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

arXiv:2604.26136v1 Announce Type: cross Abstract: Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scientific communication. In this pap…
arXiv cs.CL TIER_1 English(EN) · Yasmin Moslem · 2026-04-28 21:47

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scientific communication. In this paper, we address this challenge through our system s…

COVERAGE [4]

JaiTTS: A Thai Voice Cloning Model

JaiTTS: A Thai Voice Cloning Model

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

RELATED ENTITIES

RELATED TOPICS