English(EN) Synthesizing the Lombard Effect: Multi-Level Control of Speech Clarity and Vocal Effort in TTS

新的 TTS 模型模拟人类 Lombard 效应以提高语音清晰度

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-22 11:14

研究人员开发了一种新的文本到语音 (TTS) 模型，该模型可以模拟 Lombard 效应，即人类在嘈杂环境中说话声音更大、更清晰的现象。该模型利用流匹配和伪标签来控制发声努力和发音，从而实现对这些语音特征的连续控制。这使得能够进行单词级别的强调，并在模拟的嘈杂条件下提高了清晰度和可懂度。 AI

影响这项研究可能有助于在嘈杂环境中实现更自然、更易于理解的合成语音。

排序理由这是一篇详细介绍 TTS 新模型的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Alexander Waibel · 2026-06-22 11:14

Synthesizing the Lombard Effect: Multi-Level Control of Speech Clarity and Vocal Effort in TTS

Humans tend to speak louder and clearer in challenging environments, such as noisy conditions or when addressing hearingimpaired listeners, which is called Lombard effect. To simulate this behavior in speech synthesis systems, we introduce a flow-matching based text-to-speech (TT…