Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation
Researchers have developed Video-OPD, a novel post-training framework for temporal video grounding that utilizes on-policy distillation. This method optimizes trajectories directly from the current policy, maintaining alignment between training and inference distributions. Video-OPD converts sparse, episode-level feedback into fine-grained, step-wise learning signals, outperforming existing GRPO-based methods in efficiency and convergence speed. AI
IMPACT Introduces a more efficient training paradigm for temporal video grounding, potentially accelerating development in multimodal AI.