VideoMDM: Towards 3D Human Motion Generation From 2D Supervision
Researchers have developed VideoMDM, a novel diffusion-based framework for generating 3D human motion from 2D video supervision. This method trains 3D motion priors directly from 2D poses, bypassing the need for explicit 3D ground truth data. By using a pretrained 2D-to-3D lifter as a noisy teacher and employing a depth-weighted 2D reprojection loss, VideoMDM achieves performance close to fully 3D-supervised models on benchmarks like HumanML3D. The framework also demonstrates success on real-world video datasets such as Fit3D and NBA, generating motions that are preferred by human evaluators. AI
IMPACT Enables more accessible 3D motion generation for applications like animation and virtual reality by leveraging readily available 2D video data.