PulseAugur
EN
LIVE 10:02:06

Flex4DHuman reconstructs 4D humans from video using diffusion models

Researchers have developed Flex4DHuman, a novel diffusion model capable of reconstructing dynamic 4D human models from monocular or sparse multi-view videos. This model, built upon the Wan 2.1 1.3B text-to-video architecture, does not require explicit geometry priors like skeletons or depth maps. Instead, it uses relative camera pose conditioning and a unique five-axis positional encoding to generate synchronized dense multi-view videos, which can then be used with 4D Gaussian Splatting for detailed 4D reconstruction. Flex4DHuman has demonstrated superior performance on benchmark datasets and shows potential for applications in gaming, AR/VR, and video re-shooting. AI

IMPACT Enables scalable 4D content creation from casual videos, potentially impacting AR/VR and gaming industries.

RANK_REASON This is a research paper describing a new model and methodology.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CV TIER_1 English(EN) · Jen-Hao Cheng, Yipeng Wang, Hao Zhang, Gengshan Yang, Jenq-Neng Hwang ·

    Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstruction

    arXiv:2606.13655v1 Announce Type: new Abstract: We present Flex4DHuman, a multi-view video diffusion model that transforms a monocular or sparse multi-view video of a dynamic subject into synchronized dense multi-view videos using only relative camera-pose conditioning. Unlike pr…

  2. arXiv cs.CV TIER_1 English(EN) · Jenq-Neng Hwang ·

    Flex4DHuman: Flexible Multi-view Video Diffusion for 4D Human Reconstruction

    We present Flex4DHuman, a multi-view video diffusion model that transforms a monocular or sparse multi-view video of a dynamic subject into synchronized dense multi-view videos using only relative camera-pose conditioning. Unlike prior human-centric methods that rely on skeletons…