English(EN) MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables

MELD语音模型联合优化编码器和语言模型

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-28 12:39

研究人员开发了MELD，一种利用梅尔频谱图上离散潜在变量的新型语音语言建模方法。该方法联合优化编码器和语音语言模型，解决了单独优化编码器的局限性。MELD在零样本语音合成和语音识别任务上展示了优于现有基线方法的改进，同时还缓解了自回归梅尔频谱图建模中常见的延长静音和漏词等问题。 AI

影响这种联合优化方法可能带来更强大、更高效的语音合成和识别系统。

排序理由该集群包含一篇详细介绍新模型及其方法的学术论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Sung-Lin Yeh, Wei Zhou, Gil Keren, Duc Le, Zhong Meng, Hao Tang, Jay Mahadeokar, Ozlem Kalinli, Alexandre Mourachko · 2026-05-29 04:00

MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables

arXiv:2605.29859v1 Announce Type: cross Abstract: Recent speech language models rely on encoders that are optimized separately from autoregressive models. Since these encoders are unaware of the downstream objectives, the extracted representations may not be optimal for downstrea…
arXiv cs.CL TIER_1 English(EN) · Alexandre Mourachko · 2026-05-28 12:39

MELD: Mel-Spectrogram-Based Speech Language Modeling with Discrete Latent Variables

Recent speech language models rely on encoders that are optimized separately from autoregressive models. Since these encoders are unaware of the downstream objectives, the extracted representations may not be optimal for downstream tasks. To address this limitation, we introduce …