English(EN)Temporal-Aware Reasoning Optimization for Video Temporal Grounding
新基准和框架提升视频时序定位能力
作者PulseAugur 编辑部·[6 个来源]·
研究人员为改进长视频中的时序定位引入了新的基准和框架。一项研究认为,小时级视频定位主要是一个搜索问题,而非识别问题,并发布了ExtremeWhenBench基准来支持这一观点。另一种方法TaRO通过时间感知和新颖的奖励系统来优化其推理过程,从而增强多模态大语言模型。第三种方法CACR利用候选选择和因果推理,在教学视频时序定位任务上取得了最先进的性能。
AI
arXiv:2606.12300v1 Announce Type: cross Abstract: Temporal grounding--returning the interval $[t_s, t_e]$ for a natural-language query over a video--is the language interface to long-form video, yet has been studied on short videos; the dynamics of hour-scale natural-language gro…
Temporal grounding--returning the interval $[t_s, t_e]$ for a natural-language query over a video--is the language interface to long-form video, yet has been studied on short videos; the dynamics of hour-scale natural-language grounding remain underexplored. We take the position …
Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with reinforcement learning for generating reasoning paths. However, existing models often produce superficial reasoning, which offers limited guidance for precise temporal loc…
arXiv cs.CV
TIER_1English(EN)·Muge Qi, Rong Fu, Pengbin Feng, Xianda Li, Yu Cai, Yifu Guo, Shizhe Zhang, Simon James Fong, Lei Ma, Bin Li·
arXiv:2606.08436v1 Announce Type: new Abstract: The task of temporal answer grounding in instructional video (TAGV), which aims to locate precise video segments that respond to natural language queries, is increasingly important for direct video answer retrieval. This task remain…
arXiv cs.CV
TIER_1English(EN)·Minghang Zheng, Zihao Yin, Yi Yang, Yuxin Peng, Yang Liu·
arXiv:2606.09248v1 Announce Type: new Abstract: Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with reinforcement learning for generating reasoning paths. However, existing models often produce superficial reasoning, which …
Multi-modal Large Language Models (MLLMs) have achieved remarkable progress in video temporal grounding with reinforcement learning for generating reasoning paths. However, existing models often produce superficial reasoning, which offers limited guidance for precise temporal loc…