PulseAugur
实时 21:39:09
English(EN) LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

LASE模型通过使嵌入信息语言无关来改进跨脚本语音克隆

研究人员开发了LASE(语言对抗说话人编码器),以改进多语言语音克隆。标准的编码器在不同脚本之间保持说话人身份时会遇到困难,特别是在将非印度语语音映射到印度语时。LASE采用了一种新颖的训练方法,结合了监督对比损失和梯度反转交叉熵目标,以创建语言信息无关但说话人信息相关的嵌入。该方法显著减小了跨脚本的身份差距,并以显著减少的训练数据增强了跨脚本说话人召回率。 AI

影响 提高了跨脚本语音克隆的准确性,可能实现更无缝的多语言TTS系统。

排序理由 该集群包含一篇arXiv预印本,详细介绍了一种用于多语言语音克隆的说话人编码新方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

LASE模型通过使嵌入信息语言无关来改进跨脚本语音克隆

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Venkata Pushpak Teja Menta ·

    LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

    arXiv:2605.00777v1 Announce Type: cross Abstract: A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in. Off-the-shelf encoders do not, and the failure is accent-conditional. On a 1043-pa…

  2. arXiv cs.CL TIER_1 English(EN) · Venkata Pushpak Teja Menta ·

    LASE: Language-Adversarial Speaker Encoding for Indic Cross-Script Identity Preservation

    A speaker encoder used in multilingual voice cloning should treat the same speaker identically regardless of which script the audio was uttered in. Off-the-shelf encoders do not, and the failure is accent-conditional. On a 1043-pair Western-accented voice corpus across English, H…