Momentum-Guided Semantic Forecasting (MoFore) for Self-Supervised Video Representation Learning
Researchers have introduced MoFore, a novel framework for self-supervised video representation learning that focuses on forecasting future latent embeddings from distant context clips. Unlike previous methods that relied on pixel-level reconstruction or semantic alignment, MoFore learns temporally predictive representations. The framework incorporates randomized temporal-gap forecasting and contrastive regularization to enhance robustness and prevent representation collapse. Experiments on the UCF101 dataset showed that MoFore learns temporally consistent and semantically meaningful representations without requiring action labels. AI