Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 2w · [2 sources]

TokTalk: Expressive Real-time Facial Animation from Audio-LLM Tokens

Researchers have developed TokTalk, a system that generates expressive real-time facial animations directly from audio tokens produced by advanced language models. This approach bypasses traditional multi-stage processes like speech recognition and synthesis, aiming to create more natural and responsive avatar performances. TokTalk utilizes a novel dataset and a Chunk-based Conditional Flow Matching model, demonstrating competitive latency and superior quality in perceptual studies. AI

IMPACT Enables more natural and responsive avatar performances by directly using audio tokens from LLMs.

GPT-4o
arXiv
TokTalk