AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation
Researchers have developed AGILE, a new framework for reconstructing hand-object interactions from videos. This method uses an agentic pipeline with a Vision-Language Model to guide a generative model, creating complete object meshes even with heavy occlusion. It bypasses traditional Structure-from-Motion by using a foundation model for initial pose estimation and temporal tracking, ensuring physical plausibility through integrated constraints. AGILE demonstrates superior geometric accuracy and robustness on challenging video sequences, producing simulation-ready assets for robotics. AI
IMPACT Enhances realism and utility of reconstructed 3D assets for robotics and VR applications.