Flex4DHuman uses diffusion models for 4D human reconstruction

By PulseAugur Editorial · [1 sources] · 2026-06-12 04:00

Researchers have developed Flex4DHuman, a novel diffusion model capable of reconstructing dynamic 4D human models from monocular or sparse multi-view videos. This model, built upon the Wan 2.1 1.3B text-to-video architecture, does not require explicit geometry priors like skeletons or depth maps. Instead, it utilizes relative camera pose conditioning and a unique five-axis positional encoding to generate synchronized dense multi-view videos. These outputs can then be processed by downstream pipelines to create dynamic 4D Gaussian splats, demonstrating state-of-the-art performance on benchmarks like DNA-Rendering and ActorsHQ. AI

IMPACT Enables scalable 4D content creation from casual videos for simulation, gaming, and AR/VR.

RANK_REASON The cluster describes a new research paper detailing a novel method for 4D human reconstruction. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Jen-Hao Cheng, Yipeng Wang, Hao Zhang, Gengshan Yang, Jenq-Neng Hwang · 2026-06-12 04:00

Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstruction

arXiv:2606.13655v1 Announce Type: new Abstract: We present Flex4DHuman, a multi-view video diffusion model that transforms a monocular or sparse multi-view video of a dynamic subject into synchronized dense multi-view videos using only relative camera-pose conditioning. Unlike pr…

COVERAGE [1]

Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstruction

RELATED ENTITIES

RELATED TOPICS