新的OMTG基准通过新颖的奖励函数超越Gemini 2.5 Pro

作者 PulseAugur 编辑部 · [3 个来源] · 2026-06-04 00:00

研究人员推出了一项新的基准和数据集，用于一对多时间定位（OMTG），这项任务涉及定位与单个文本查询相对应的多个视频片段。现有的多模态大语言模型（MLLMs）由于缺乏事件基数感知而难以处理OMTG。提出的解决方案包括新颖的时间和字幕奖励函数，利用思维链（Chain-of-Thought）推理来提高精确性和完整性。实验表明，新的最先进的有效时间F1分数达到43.65%，显著优于Gemini 2.5 Pro和Seed-1.8等模型。 AI

影响为多片段视频检索建立了一个新的基准和数据集，推动了MLLMs在复杂时间定位任务中的能力。

排序理由该集群包含一篇研究论文，介绍了一个特定AI任务的新基准、数据集和模型。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.AI TIER_1 English(EN) · Jason Li · 2026-06-04 15:31

迈向一对多时序定位

Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments for a single query -- a setting we term O…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-04 00:00

Towards One-to-Many Temporal Grounding

One-to-Many Temporal Grounding addresses the challenge of localizing multiple disjoint video segments for a single textual query through a comprehensive benchmark, novel reward functions, and improved policy optimization.
arXiv cs.CV TIER_1 English(EN) · Qi Xu, Yue Tan, Shihao Chen, Jiahao Meng, Anna Wang, Shunping Ji, Hao Fei, Jason Li · 2026-06-05 04:00

迈向一对多时间接地

arXiv:2606.06294v1 Announce Type: new Abstract: Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint se…

报道来源 [3]

迈向一对多时序定位

Towards One-to-Many Temporal Grounding

迈向一对多时间接地

相关实体

相关话题