English(EN) How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

新方法揭示风格指令如何塑造文本到语音输出

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-18 17:47

研究人员开发了一种新方法，用于理解自然语言指令如何影响风格字幕文本到语音（TTS）系统的输出。通过将DAAM框架应用于语音扩散模型，该研究分析了风格字幕中的特定词语如何塑造生成的波形。研究结果表明，风格标记比内容标记具有更低的时间方差，并且它们的影响在生成早期阶段和模型的深层中达到峰值。 AI

影响提供了对表达性TTS系统可控性的更深入理解，可能带来改进的语音生成。

排序理由学术论文，详细介绍了一种分析TTS模型的新方法。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Nityanand Mathur, Hamees Sayed, Wasim Madha, Apoorv Singh, Sameer Khurana, Akshat Mandloi, Sudarshan Kamath · 2026-06-19 04:00

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

arXiv:2606.20532v1 Announce Type: new Abstract: Style-captioned text-to-speech systems use natural language to control voice characteristics, but how individual words influence acoustic output remains unclear. Understanding this is critical for diagnosing failure modes and improv…
arXiv cs.AI TIER_1 English(EN) · Sudarshan Kamath · 2026-06-18 17:47

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

Style-captioned text-to-speech systems use natural language to control voice characteristics, but how individual words influence acoustic output remains unclear. Understanding this is critical for diagnosing failure modes and improving controllability in expressive TTS. We propos…