上下文嵌入式模型预测普通话词语的时长和音高

作者 PulseAugur 编辑部 · [2 个来源] · 2026-07-02 10:38

研究人员开发了一种使用上下文嵌入式模型（CEs）来预测普通话单音节词的语音时长和音高的方法。研究表明，CEs能够准确预测词语时长，甚至在单个词元级别上，其表现优于随机基线和排列基线。这些预测的时长足以重建音高轮廓，其结果也近似于经验轮廓，并优于排列基线。 AI

影响这项研究通过更准确地预测韵律特征，有望改进语音合成和识别系统。

排序理由学术论文，详细介绍了一种使用嵌入式模型预测语言特征的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Xiaoyun Jin, Mirjam Ernestus, R. Harald Baayen · 2026-07-03 04:00

Using embeddings to predict spoken word duration and pitch in Mandarin monosyllabic words

arXiv:2607.02002v1 Announce Type: new Abstract: Time-normalized f0 contours of Mandarin words in conversational speech have been shown to be predictable in part from their contextualized embeddings (CEs). The present study investigates whether CEs also predict spoken word duratio…
arXiv cs.CL TIER_1 English(EN) · R. Harald Baayen · 2026-07-02 10:38

Using embeddings to predict spoken word duration and pitch in Mandarin monosyllabic words

Time-normalized f0 contours of Mandarin words in conversational speech have been shown to be predictable in part from their contextualized embeddings (CEs). The present study investigates whether CEs also predict spoken word duration for 7470 tokens of Mandarin monosyllabic CV wo…