Hallo-Live achieves real-time audio-video avatar generation at 20 FPS

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed Hallo-Live, a novel framework for real-time text-driven audio-visual avatar generation. This system utilizes an asynchronous dual-stream diffusion approach combined with human-centric preference distillation to achieve high fidelity and synchronization. Hallo-Live demonstrates significant speed improvements, running at 20.38 FPS with low latency, making it suitable for interactive applications. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Enables real-time, interactive avatar generation for applications like virtual assistants and streaming.

RANK_REASON This is a research paper detailing a new framework for real-time avatar generation.

Read on arXiv cs.CV →

COVERAGE [1]

arXiv cs.CV TIER_1 · Chunyu Li, Jiaye Li, Ruiqiao Mei, Haoyuan Xia, Hao Zhu, Jingdong Wang, Siyu Zhu · 2026-04-28 04:00

Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation

arXiv:2604.23632v1 Announce Type: new Abstract: Real-time text-driven joint audio-video avatar generation requires jointly synthesizing portrait video and speech with high fidelity and precise synchronization, yet existing audio-visual diffusion models remain too slow for interac…

COVERAGE [1]

Hallo-Live: Real-Time Streaming Joint Audio-Video Avatar Generation with Asynchronous Dual-Stream and Human-Centric Preference Distillation

RELATED ENTITIES

RELATED TOPICS