English(EN) ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

ZeSTA框架通过零样本TTS增强提升个性化语音合成能力

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-19 04:00

研究人员开发了ZeSTA，一个利用零样本文本到语音（ZS-TTS）作为数据增强源来改进个性化语音合成的新框架。该方法解决了在微调过程中混合合成和真实语音数据时说话人相似性下降的常见问题。ZeSTA采用领域条件训练方法来区分真实语音和合成语音，并结合真实数据的过采样来稳定适应，尤其是在低资源场景下。 AI

影响这项研究可能带来更高效、更有效的个性化语音生成，尤其是在训练数据有限的情况下。

排序理由该集群包含一篇详细介绍语音合成新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Youngwon Choi, Jinwoo Oh, Hwayeon Kim, Hyeonyu Kim · 2026-06-19 04:00

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

arXiv:2603.04219v2 Announce Type: replace-cross Abstract: We investigate the use of zero-shot text-to-speech (ZS-TTS) as a data augmentation source for low-resource personalized speech synthesis. While synthetic augmentation can provide linguistically rich and phonetically divers…

报道来源 [1]

ZeSTA: Zero-Shot TTS Augmentation with Domain-Conditioned Training for Data-Efficient Personalized Speech Synthesis

相关实体

相关话题