English(EN) Whisper-GPT -- Continuous Discrete Hybrid Representation Language Models For Speech And Music

Whisper-GPT 模型融合连续和离散音频以进行生成

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-10 04:00

研究人员开发了 Whisper-GPT，这是一种新颖的语言模型，专为生成语音和音乐而设计。该模型独特地集成了连续音频表示（如频谱图）与源自神经压缩算法的离散令牌。这种混合方法旨在克服纯离散令牌模型经常遇到的上下文长度限制，同时保留离散空间对于采样等任务的预测优势。 AI

影响引入了一种混合音频生成方法，可能会提高上下文处理和预测能力。

排序理由该集群包含一篇详细介绍新型音频生成模型架构的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Prateek Verma · 2026-06-10 04:00

Whisper-GPT -- Continuous Discrete Hybrid Representation Language Models For Speech And Music

arXiv:2412.11449v2 Announce Type: replace-cross Abstract: We propose WHISPER-GPT: A generative large language model (LLM) for speech and music that allows us to work with continuous audio representations and discrete tokens simultaneously as part of a single architecture. There h…