English(EN) Multi-Scale Contrastive Learning for Video Temporal Grounding

OmniVTG数据集和CoT范式增强了开放世界视频时序定位

作者 PulseAugur 编辑部 · [3 个来源] · 2026-04-28 04:00

研究人员推出了OmniVTG，这是一个大规模数据集和训练范式，旨在改进多模态大语言模型（MLLMs）的开放世界视频时序定位（VTG）。该数据集采用新颖的流程来识别和收集包含代表性不足概念的视频，并采用以字幕为中心的策略进行高质量标注。此外，还提出了一种自校正思维链（CoT）训练方法，该方法利用MLLMs的理解能力来优化预测，在现有基准和新的OmniVTG数据集上均取得了最先进的性能。 AI

影响新的数据集和训练范式可能会提高多模态模型根据文本查询准确本地化视频片段的能力。

排序理由该集群包含两篇学术论文，详细介绍了用于视频时序定位的新数据集和训练方法。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.CV TIER_1 English(EN) · Minghang Zheng, Zihao Yin, Yi Yang, Yuxin Peng, Yang Liu · 2026-04-29 04:00

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding

arXiv:2604.25276v1 Announce Type: new Abstract: Video Temporal Grounding (VTG), the task of localizing video segments from text queries, struggles in open-world settings due to limited dataset scale and semantic diversity, causing performance gaps between common and rare concepts…
arXiv cs.CV TIER_1 English(EN) · Yang Liu · 2026-04-28 06:34

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding

Video Temporal Grounding (VTG), the task of localizing video segments from text queries, struggles in open-world settings due to limited dataset scale and semantic diversity, causing performance gaps between common and rare concepts. To overcome these limitations, we introduce Om…
arXiv cs.CV TIER_1 English(EN) · Thong Thanh Nguyen, Yi Bin, Xiaobao Wu, Zhiyuan Hu, Cong-Duy T Nguyen, See-Kiong Ng, Anh Tuan Luu · 2026-04-28 04:00

Multi-Scale Contrastive Learning for Video Temporal Grounding

arXiv:2412.07157v3 Announce Type: replace Abstract: Temporal grounding, which localizes video moments related to a natural language query, is a core problem of vision-language learning and video understanding. To encode video moments of varying lengths, recent methods employ a mu…

报道来源 [3]

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding

Multi-Scale Contrastive Learning for Video Temporal Grounding

相关实体

相关话题