ComPose: When to Trust Hands for Object Pose Tracking
Researchers have developed ComPose, a new framework for 6DoF object tracking from RGB video that uniquely leverages hand movements as a complementary cue. Instead of solely treating hands as occluders, ComPose integrates hand joint information with object cues from foundation models to estimate motion. This approach enhances accuracy and robustness, particularly in scenarios with severe hand occlusion and geometric ambiguity, and can transfer to downstream robot manipulation tasks. AI
IMPACT This new tracking method could improve embodied AI and robot manipulation by enabling more robust object pose estimation, even with hand occlusions.