PulseAugur
实时 12:16:36

新的OMTG基准通过新颖的奖励函数超越Gemini 2.5 Pro

研究人员推出了一项新的基准和数据集,用于一对多时间定位(OMTG),这项任务涉及定位与单个文本查询相对应的多个视频片段。现有的多模态大语言模型(MLLMs)由于缺乏事件基数感知而难以处理OMTG。提出的解决方案包括新颖的时间和字幕奖励函数,利用思维链(Chain-of-Thought)推理来提高精确性和完整性。实验表明,新的最先进的有效时间F1分数达到43.65%,显著优于Gemini 2.5 Pro和Seed-1.8等模型。 AI

影响 为多片段视频检索建立了一个新的基准和数据集,推动了MLLMs在复杂时间定位任务中的能力。

排序理由 该集群包含一篇研究论文,介绍了一个特定AI任务的新基准、数据集和模型。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · Jason Li ·

    迈向一对多时序定位

    Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint segments for a single query -- a setting we term O…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    Towards One-to-Many Temporal Grounding

    One-to-Many Temporal Grounding addresses the challenge of localizing multiple disjoint video segments for a single textual query through a comprehensive benchmark, novel reward functions, and improved policy optimization.

  3. arXiv cs.CV TIER_1 English(EN) · Qi Xu, Yue Tan, Shihao Chen, Jiahao Meng, Anna Wang, Shunping Ji, Hao Fei, Jason Li ·

    迈向一对多时间接地

    arXiv:2606.06294v1 Announce Type: new Abstract: Temporal Grounding (TG) aims to localize video segments corresponding to a textual query. Prior research predominantly focuses on single-segment retrieval. Real-world scenarios, however, often require localizing multiple disjoint se…