AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation
Two new research papers introduce novel frameworks for reconstructing 3D objects from egocentric videos, focusing on hand interactions. The first, ROHIT, uses a Constrained Optimisation and Propagation (COP) framework to model object poses during stable grasps. The second, AGILE, employs an agentic generation approach guided by a Vision-Language Model to create watertight meshes, bypassing traditional Structure-from-Motion methods. AI
IMPACT These methods could improve digital twins for robotics and VR by enabling more accurate 3D object reconstruction from real-world interactions.