PulseAugur
实时 07:41:25

DenseStep2M 管道自动化视频标注以增进理解

研究人员开发了 DenseStep2M,一个新颖的管道,可以在无需训练数据的情况下自动从教学视频中提取详细的程序性标注。该系统分割视频、过滤无关内容,并使用 Qwen2.5-VLDeepSeek-R1 等先进的多模态和大型语言模型来生成结构化的、带时间戳的步骤。由此产生的 DenseStep2M 数据集包含约 100,000 个视频和 200 万个步骤,显著提高了密集视频字幕和时间定位等任务的性能。 AI

影响 通过提供大规模、详细的程序性标注,实现了更复杂的视频理解和推理。

排序理由 介绍用于视频标注的新数据集和方法的学术论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

DenseStep2M 管道自动化视频标注以增进理解

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Mingji Ge, Qirui Chen, Zeqian Li, Weidi Xie ·

    DenseStep2M: A Scalable, Training-Free Pipeline for Dense Instructional Video Annotation

    arXiv:2604.26565v1 Announce Type: new Abstract: Long-term video understanding requires interpreting complex temporal events and reasoning over procedural activities. While instructional video corpora, like HowTo100M, offer rich resources for model training, they present significa…

  2. arXiv cs.CV TIER_1 English(EN) · Weidi Xie ·

    DenseStep2M: A Scalable, Training-Free Pipeline for Dense Instructional Video Annotation

    Long-term video understanding requires interpreting complex temporal events and reasoning over procedural activities. While instructional video corpora, like HowTo100M, offer rich resources for model training, they present significant challenges, including noisy ASR transcripts a…