Brief · PulseAugur

RESEARCH · arXiv cs.CV English(EN) · 1w · [3 sources]

AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation

Two new research papers introduce novel frameworks for reconstructing 3D objects from egocentric videos, focusing on hand interactions. The first, ROHIT, uses a Constrained Optimisation and Propagation (COP) framework to model object poses during stable grasps. The second, AGILE, employs an agentic generation approach guided by a Vision-Language Model to create watertight meshes, bypassing traditional Structure-from-Motion methods. AI

IMPACT These methods could improve digital twins for robotics and VR by enabling more accurate 3D object reconstruction from real-world interactions.

Vision-Language Model
Structure-from-Motion
DexYCB
HO3D
Zhifan Zhu
Jin-Chuan Shi
ROHIT
EPIC-Kitchens
HOT3D