English(EN) Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation

Video-OPD 框架增强了用于视频定位的多模态大语言模型

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-03 04:00

研究人员开发了 Video-OPD，一种利用 On-Policy Distillation 的新颖时序视频定位训练后框架。该方法直接从当前策略优化轨迹，保持训练和推理分布之间的一致性。Video-OPD 将稀疏的、片段级别的反馈转换为细粒度的、逐步的学习信号，在效率和收敛速度方面优于现有的基于 GRPO 的方法。 AI

影响为时序视频定位引入了更有效的训练范式，可能加速多模态人工智能的发展。

排序理由该集群包含一篇详细介绍多模态大语言模型新方法的 ist 论文。 [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Jiaze Li, Hao Yin, Haoran Xu, Boshen Xu, Wenhui Tan, Zewen He, Jianzhong Ju, Zhenbo Luo, Jian Luan · 2026-06-03 04:00

Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation

arXiv:2602.02994v3 Announce Type: replace Abstract: Reinforcement learning has emerged as a principled post-training paradigm for Temporal Video Grounding (TVG) due to its on-policy optimization, yet existing GRPO-based methods remain fundamentally constrained by sparse reward si…

报道来源 [1]

Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation

相关话题