PulseAugur
EN
LIVE 13:00:40

VideoMDM generates 3D human motion from 2D video without 3D ground truth

Researchers have developed VideoMDM, a novel diffusion-based framework for generating 3D human motion from 2D video supervision. This method trains 3D motion priors directly from 2D poses, bypassing the need for explicit 3D ground truth data. By using a pretrained 2D-to-3D lifter as a noisy teacher and employing a depth-weighted 2D reprojection loss, VideoMDM achieves performance close to fully 3D-supervised models on benchmarks like HumanML3D. The framework also demonstrates success on real-world video datasets such as Fit3D and NBA, generating motions that are preferred by human evaluators. AI

IMPACT Enables more accessible 3D motion generation for applications like animation and virtual reality by leveraging readily available 2D video data.

RANK_REASON This is a research paper detailing a new method for 3D human motion generation.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

COVERAGE [3]

  1. arXiv cs.LG TIER_1 English(EN) · Or Litany ·

    VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

    We introduce VideoMDM, a diffusion-based framework that trains 3D human motion priors directly from accurate 2D poses extracted from monocular videos, without any 3D ground truth. A pretrained 2D-to-3D lifter provides approximate 3D pose sequences that serve as a noisy teacher: t…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

    VideoMDM trains 3D human motion priors from 2D poses using a diffusion framework with 2D reprojection loss and 3D motion regularizers, achieving near-3D supervised performance without requiring 3D ground truth.

  3. arXiv cs.CV TIER_1 English(EN) · Amir Mann, Gal Michael Harari, Merav Keidar, Or Litany ·

    VideoMDM: Towards 3D Human Motion Generation From 2D Supervision

    arXiv:2606.13364v1 Announce Type: cross Abstract: We introduce VideoMDM, a diffusion-based framework that trains 3D human motion priors directly from accurate 2D poses extracted from monocular videos, without any 3D ground truth. A pretrained 2D-to-3D lifter provides approximate …