English(EN) Temporal-Aware Reasoning Optimization for Video Temporal Grounding

新基准和框架提升视频时序定位能力

作者 PulseAugur 编辑部 · [6 个来源] · 2026-06-08 09:21

研究人员为改进长视频中的时序定位引入了新的基准和框架。一项研究认为，小时级视频定位主要是一个搜索问题，而非识别问题，并发布了ExtremeWhenBench基准来支持这一观点。另一种方法TaRO通过时间感知和新颖的奖励系统来优化其推理过程，从而增强多模态大语言模型。第三种方法CACR利用候选选择和因果推理，在教学视频时序定位任务上取得了最先进的性能。 AI

影响新方法和基准旨在提高AI从长视频中理解和检索信息的能力。

排序理由多篇研究论文介绍了用于视频时序定位的新基准和框架。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 6 个来源。我们如何撰写摘要 →

报道来源 [6]

arXiv cs.AI TIER_1 English(EN) · Sukmin Seo, Geewook Kim · 2026-06-11 04:00

Natural-Language Temporal Grounding in Hour-Long Videos is a Search Problem: A Benchmark and Empirical Decomposition

arXiv:2606.12300v1 Announce Type: cross Abstract: Temporal grounding--returning the interval $[t_s, t_e]$ for a natural-language query over a video--is the language interface to long-form video, yet has been studied on short videos; the dynamics of hour-scale natural-language gro…
arXiv cs.AI TIER_1 English(EN) · Geewook Kim · 2026-06-10 16:35

长达一小时视频中的自然语言时间定位是一个搜索问题：基准测试与经验分解

Temporal grounding--returning the interval $[t_s, t_e]$ for a natural-language query over a video--is the language interface to long-form video, yet has been studied on short videos; the dynamics of hour-scale natural-language grounding remain underexplored. We take the position …
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 09:21

Temporal-Aware Reasoning Optimization for Video Temporal Grounding

Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with reinforcement learning for generating reasoning paths. However, existing models often produce superficial reasoning, which offers limited guidance for precise temporal loc…
arXiv cs.CV TIER_1 English(EN) · Muge Qi, Rong Fu, Pengbin Feng, Xianda Li, Yu Cai, Yifu Guo, Shizhe Zhang, Simon James Fong, Lei Ma, Bin Li · 2026-06-09 04:00

通过候选感知因果推理增强指令视频中的时间答案接地

arXiv:2606.08436v1 Announce Type: new Abstract: The task of temporal answer grounding in instructional video (TAGV), which aims to locate precise video segments that respond to natural language queries, is increasingly important for direct video answer retrieval. This task remain…
arXiv cs.CV TIER_1 English(EN) · Minghang Zheng, Zihao Yin, Yi Yang, Yuxin Peng, Yang Liu · 2026-06-09 04:00

面向视频时序定位的时间感知推理优化

arXiv:2606.09248v1 Announce Type: new Abstract: Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with reinforcement learning for generating reasoning paths. However, existing models often produce superficial reasoning, which …
arXiv cs.CV TIER_1 English(EN) · Yang Liu · 2026-06-08 09:21

面向视频时序定位的时间感知推理优化

Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with reinforcement learning for generating reasoning paths. However, existing models often produce superficial reasoning, which offers limited guidance for precise temporal loc…

报道来源 [6]

Natural-Language Temporal Grounding in Hour-Long Videos is a Search Problem: A Benchmark and Empirical Decomposition

长达一小时视频中的自然语言时间定位是一个搜索问题：基准测试与经验分解

Temporal-Aware Reasoning Optimization for Video Temporal Grounding

通过候选感知因果推理增强指令视频中的时间答案接地

面向视频时序定位的时间感知推理优化

面向视频时序定位的时间感知推理优化

相关实体

相关话题