English(EN) On the Effect of Segmentation Width and Cluster Size on Speech Resynthesis and Continuation in Generative Spoken Language Models

生成式口语模型以较低比特率实现高质量语音

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-22 12:58

一篇新发表在arXiv上的研究论文探讨了生成式口语建模（GSLM）在语音合成和续写方面的有效性。该研究调查了k-means聚类中不同的分段宽度和聚类大小如何影响不同比特率下的语音质量。研究人员发现，与先前认为的相比，在较低比特率下即可实现清晰自然的语音，并且语音续写质量保持稳定。该论文指出，当前的GSLM设置可能过于复杂，并强调了改进自动评估方法的必要性，因为基于LLM的指标与人类主观评分的相关性仍然较低。 AI

影响这项研究可能带来更高效、计算量更小的语音合成和续写模型。

排序理由学术论文，详细介绍了语音合成的新方法和发现。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Yusuke Miyao · 2026-06-22 12:58

On the Effect of Segmentation Width and Cluster Size on Speech Resynthesis and Continuation in Generative Spoken Language Models

Generative Spoken Language Modeling (GSLM) enables text-free speech modeling by training language models (LMs) using discrete speech representations instead of textual transcription. In this paper, we investigate the performance of GSLM on speech synthesis and continuation using …

报道来源 [1]

On the Effect of Segmentation Width and Cluster Size on Speech Resynthesis and Continuation in Generative Spoken Language Models

相关实体

相关话题