GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture
Researchers have introduced GOT-JEPA, a novel pretraining framework designed to improve generic object tracking capabilities. This method extends the Joint-Embedding Predictive Architecture (JEPA) by focusing on predicting tracking models rather than just image features. By training a student predictor to learn from corrupted frames and a teacher predictor that uses clean frames, GOT-JEPA enhances robustness to occlusions and environmental changes. Additionally, the OccuSolver component further refines occlusion perception by adapting point-centric trackers for object-aware visibility estimation and detailed occlusion pattern capture, leading to improved generalization across various benchmarks. AI