English(EN) Integrating Facial Generation into Full-Duplex Spoken Dialogue Systems

新的对话系统将实时面部生成与语音相结合

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-20 09:59

研究人员开发了Moshi-Face，一个新颖的全双工语音对话系统，它将面部生成与音频处理相结合。该系统利用VQ-VAE将面部数据编码为离散令牌，并使用Face Transformer非自回归地生成这些令牌。其结果是一个能够实时生成同步语音和面部表情的模型，在实现低延迟视听对齐的同时保持对话质量。 AI

影响通过同步语音与面部运动，实现更自然、更具表现力的人机交互。

排序理由研究论文，详细介绍新模型发布。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Ryuichiro Higashinaka · 2026-06-20 09:59

将面部生成集成到全双工语音对话系统中

Full-duplex spoken dialogue models, such as Moshi, enable natural, low-latency voice conversations. However, they remain limited to the audio modality, lacking the facial expressions that are integral to human communication. We present Moshi-Face, the first full-duplex dialogue m…

报道来源 [1]

将面部生成集成到全双工语音对话系统中

相关实体

相关话题