CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning
Researchers have developed CoMo, a novel method for learning continuous latent motion from internet videos to enhance robot learning. CoMo employs an early temporal difference mechanism to increase the difficulty of shortcut learning and explicitly strengthens motion cues. Additionally, a temporal contrastive learning scheme is used to ensure latent motion better captures meaningful foregrounds by constructing positive pairs with small future temporal offsets and negative pairs by reversing temporal direction. This approach allows CoMo to exhibit strong zero-shot generalization, generating effective pseudo action labels for unseen videos and leading to superior performance in policies co-trained with these labels. AI
IMPACT Enables more scalable and effective robot learning by leveraging vast internet video data for motion understanding.