Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 20h · [2 sources]

ReFree: Towards Realistic Co-Speech Video Generation via Reward-Free RL and Multilevel Speech Guidance

Researchers have developed ReFree-S2V, a novel framework for generating realistic co-speech video animations. This approach uses a flow-matching model and a multi-level speech representation to ensure accurate lip synchronization and natural facial expressions. To improve head movements, a reward-free reinforcement learning scheme is employed, avoiding the need for costly human annotations or handcrafted metrics. Experiments show ReFree-S2V surpasses existing methods in both quantitative lip-sync accuracy and qualitative evaluations of naturalness. AI

IMPACT This research advances co-speech video generation, potentially improving virtual avatars and digital communication tools.

ReFree-S2V
Salaheldin Mohamed