FROST-STA: Frozen Dense Features for the Ego4D Short-Term Object Interaction Anticipation
Researchers have developed FROST-STA, a system designed for short-term anticipation in egocentric videos, aiming to predict object interactions. The model uses frozen dense features from a ViT-G backbone, extracting video and image tokens that are then fused and decoded to predict object boxes, labels, and time-to-contact. FROST-STA achieved second place in the Ego4D Short-Term Object Interaction Anticipation Challenge, demonstrating the effectiveness of pre-trained features for interaction forecasting. AI
IMPACT Demonstrates a novel approach to egocentric video analysis, potentially improving human-robot interaction and autonomous systems.