New TTS method boosts emotion control accuracy by 12%

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-11 04:00

Researchers have developed a new method called Cross-modal Consistency Guided Classifier-Free Guidance (CCG-CFG) to improve emotion control in auto-regressive Text-to-Speech (TTS) models. This technique dynamically adjusts guidance scales based on the conflict between textual and desired speech emotions, enhancing emotional alignment. When applied to the CosyVoice2 model, this approach led to significant improvements in emotion recognition accuracy and subjective quality scores, outperforming existing methods like HierSpeech++ and Qwen3-TTS. AI

影响 Enhances TTS expressiveness and accuracy, potentially leading to more natural and emotionally resonant AI-generated speech.

排序理由 The cluster contains a research paper detailing a new method for TTS emotion control. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Yizhou Peng, Yukun Ma, Chong Zhang, Yi-Wen Chao, Chongjia Ni, Bin Ma, Eng Siong Chng · 2026-06-11 04:00

Cross-modal Consistency Guidance for Robust Emotion Control in Auto-Regressive TTS Models

arXiv:2510.13293v4 Announce Type: replace Abstract: While Text-to-Speech (TTS) systems enable emotional control via natural-language instructions, expressiveness, naturalness, and speech quality degrade when the target emotion conflicts with the textual semantics. We propose a Cr…

报道来源 [1]

Cross-modal Consistency Guidance for Robust Emotion Control in Auto-Regressive TTS Models

相关话题