Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 10h

AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation

Researchers have developed AGILE, a new framework for reconstructing hand-object interactions from videos. This method uses an agentic pipeline with a Vision-Language Model to guide a generative model, creating complete object meshes even with heavy occlusion. It bypasses traditional Structure-from-Motion by using a foundation model for initial pose estimation and temporal tracking, ensuring physical plausibility through integrated constraints. AGILE demonstrates superior geometric accuracy and robustness on challenging video sequences, producing simulation-ready assets for robotics. AI

IMPACT Enhances realism and utility of reconstructed 3D assets for robotics and VR applications.

Vision-Language Model
Structure-from-Motion
DexYCB
HO3D