PulseAugur
实时 14:02:05
English(EN) One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

新的泰语语音克隆模型超越人类真实数据和商业旗舰模型

研究人员开发了 JaiTTS-v1.0,一个泰语语音克隆文本到语音模型,取得了最先进的成果,在短时语音生成方面,其词错误率(CER)为1.94%,超越了人类真实数据。该模型改编自VoxCPM,可以直接处理数字和泰英混合语,无需显式文本规范化。在人类评估中,JaiTTS-v1.0在绝大多数配对比较中优于商业旗舰模型。另外,另一项研究专注于科学语音的跨语言语音克隆,评估了基于OmniVoice的模型,并使用数据增强来提高可懂度,同时保持阿拉伯语、中文和法语之间的说话人相似性。 AI

影响 泰语语音克隆和跨语言语音合成的进步可以实现更自然、更易于访问的通信工具。

排序理由 该集群包含两篇arXiv论文,详细介绍了新的语音合成和语音克隆模型。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

新的泰语语音克隆模型超越人类真实数据和商业旗舰模型

报道来源 [4]

  1. arXiv cs.CL TIER_1 English(EN) · Jullajak Karnjanaekarin, Pontakorn Trakuekul, Narongkorn Panitsrisit, Sumana Sumanakul, Vichayuth Nitayasomboon, Nithid Guntasin, Thanavin Denkavin, Attapol T. Rutherford ·

    JaiTTS: A Thai Voice Cloning Model

    arXiv:2604.27607v1 Announce Type: new Abstract: We present JaiTTS-v1.0, a state-of-the-art Thai voice cloning text-to-speech model built through continual training on a large Thai-centric speech corpus. The model architecture is adapted from VoxCPM, a tokenizer-free autoregressiv…

  2. arXiv cs.CL TIER_1 English(EN) · Attapol T. Rutherford ·

    JaiTTS: A Thai Voice Cloning Model

    We present JaiTTS-v1.0, a state-of-the-art Thai voice cloning text-to-speech model built through continual training on a large Thai-centric speech corpus. The model architecture is adapted from VoxCPM, a tokenizer-free autoregressive TTS model. JaiTTS-v1.0 directly processes nume…

  3. arXiv cs.CL TIER_1 English(EN) · Amanuel Gizachew Abebe, Yasmin Moslem ·

    One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

    arXiv:2604.26136v1 Announce Type: cross Abstract: Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scientific communication. In this pap…

  4. arXiv cs.CL TIER_1 English(EN) · Yasmin Moslem ·

    One Voice, Many Tongues: Cross-Lingual Voice Cloning for Scientific Speech

    Preserving a speaker's voice identity while generating speech in a different language remains a fundamental challenge in spoken language technology, particularly in specialized domains such as scientific communication. In this paper, we address this challenge through our system s…