English(EN) LongEgoRefer: A Benchmark for Long-Form Egocentric Video Referring Expression Comprehension

新基准LongEgoRefer挑战AI进行长篇主观视角视频理解

作者 PulseAugur 编辑部 · [2 个来源] · 2026-07-02 12:32

研究人员推出了LongEgoRefer，这是一个旨在评估长篇主观视角视频中视频指代表达理解能力的新基准。该基准源自Ego4D数据集，包含近1500个指代表达，视频平均长度为45分钟，面临着物体稀疏出现和复杂人机交互等挑战。当前最先进的模型甚至无需训练的基线模型在LongEgoRefer上都表现不佳，凸显了对更先进的视频理解模型的需求，这些模型能够对扩展的、动态的叙事进行时空定位。 AI

影响该基准将推动AI模型在理解复杂、长篇主观视角视频内容方面的发展，这对于涉及人机交互分析的应用至关重要。

排序理由该集群描述了一个计算机视觉任务的新基准，该任务发表在arXiv论文中。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Shunya Kato, Taiki Miyanishi, Shuhei Kurita, Mahiro Ukai, Nakamasa Inoue, Chenhui Chu · 2026-07-03 04:00

LongEgoRefer: A Benchmark for Long-Form Egocentric Video Referring Expression Comprehension

arXiv:2607.02096v1 Announce Type: new Abstract: Egocentric videos capture rich and diverse human-object interactions and have emerged as a fundamental resource for understanding human activities related to objects. In this context, Video Referring Expression Comprehension (Video …
arXiv cs.CV TIER_1 English(EN) · Chenhui Chu · 2026-07-02 12:32

LongEgoRefer: A Benchmark for Long-Form Egocentric Video Referring Expression Comprehension

Egocentric videos capture rich and diverse human-object interactions and have emerged as a fundamental resource for understanding human activities related to objects. In this context, Video Referring Expression Comprehension (Video REC), the task of localizing the temporal and sp…

报道来源 [2]

LongEgoRefer: A Benchmark for Long-Form Egocentric Video Referring Expression Comprehension

LongEgoRefer: A Benchmark for Long-Form Egocentric Video Referring Expression Comprehension

相关实体

相关话题