New framework generates egocentric video with hand control from monocular video

By PulseAugur Editorial · [1 sources] · 2026-07-03 04:00

Researchers have developed HandsOnWorld, a novel framework for generating egocentric videos controlled by hand movements. This system overcomes the limitations of existing methods that require extensive multi-view or marker-based motion capture by learning from unconstrained monocular video. To address the scarcity of 3D hand annotations in large egocentric datasets, the team created EgoVid-Pro, a dataset of clean hand trajectories derived from in-the-wild egocentric videos. They also introduced the Plücker Hand Map to disentangle camera and hand motion, improving reconstruction fidelity and control accuracy. AI

IMPACT This framework could enable more realistic and controllable AI-generated egocentric video content for applications like virtual reality and simulation.

RANK_REASON The cluster describes a new research paper detailing a novel framework and dataset for video generation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New framework generates egocentric video with hand control from monocular video

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Yushuo Chen, Xiaoyu Shi, Xiaoshi Wu, Xintao Wang, Pengfei Wan, Yebin Liu · 2026-07-03 04:00

HandsOnWorld: Unconstrained Egocentric Video Generation with Camera-Disentangled Hand Control

arXiv:2607.02075v1 Announce Type: new Abstract: We present HandsOnWorld, a framework for hand-controlled egocentric video generation that forgoes multi-view and marker-based motion capture, learning instead from unconstrained monocular video. Such generality is bottlenecked by th…

COVERAGE [1]

HandsOnWorld: Unconstrained Egocentric Video Generation with Camera-Disentangled Hand Control

RELATED TOPICS