PulseAugur
实时 18:14:25
English(EN) TokTalk: Expressive Real-time Facial Animation from Audio-LLM Tokens

TokTalk 从音频Token生成面部动画

研究人员开发了TokTalk系统,可以直接从先进大语言模型(LLM)生成的音频Token生成富有表现力的实时面部动画。这种方法绕过了传统的语音识别和合成等多个阶段的处理过程,旨在创造更自然、响应更快的虚拟形象表演。TokTalk利用了一个新颖的数据集和一个基于块的条件流匹配模型,在感知研究中展示了具有竞争力的延迟和卓越的质量。 AI

影响 通过直接使用大语言模型(LLM)的音频Token,实现了更自然、响应更快的虚拟形象表演。

排序理由 该集群包含一篇详细介绍生成面部动画新系统的研究论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Qingcheng Zhao, Yifang Pan, Karan Singh ·

    TokTalk:来自音频-LLM令牌的富有表现力的实时面部动画

    arXiv:2605.31294v1 Announce Type: new Abstract: Recent advances in Audio-LLMs like GPT-4o have ushered in an era of conversational interaction with language models. Conversational avatars however, still seem robotic in facial expression and conversational flow, in part due to seq…

  2. arXiv cs.CV TIER_1 English(EN) · Karan Singh ·

    TokTalk:从音频大语言模型 (LLM) 令牌实现富有表现力的实时面部动画

    Recent advances in Audio-LLMs like GPT-4o have ushered in an era of conversational interaction with language models. Conversational avatars however, still seem robotic in facial expression and conversational flow, in part due to sequential stages of speech recognition, text gener…