PulseAugur
实时 20:04:51

MAVEN pipeline 自动化视频推理数据标注

研究人员开发了 MAVEN,一个旨在自动化创建高质量结构化视频推理任务标注的代理式(agentic)流水线。该流水线能够合成多尺度事件描述,并支持代理驱动的领域自适应,使其能够在无人干预的情况下重新设计提示和流水线结构。MAVEN 已用于标注超过 5,300 个交通视频,并且在这些数据上微调名为 Cosmos-Reason2-8B 的模型,其性能在特定评估集上超越了 Gemini 2.5 Pro 和 3.1 Flash。 AI

影响 自动化视频数据标注,可能加速 VLM 训练并提高复杂推理任务的性能。

排序理由 该集群描述了一篇关于视频推理任务自动化标注流水线的新研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · Wenqi Liu, Yunxiao Wang, Shijie Ma, Meng Liu, Qile Su, Tianke Zhang, Haonan Fan, Changyi Liu, Kaiyu Jiang, Jiankang Chen, Kaiyu Tang, Bin Wen, Fan Yang, Tingting Gao, Han Li, Yinwei Wei, Xuemeng Song ·

    VideoTemp-o3: Harmonizing Temporal Grounding and Video Understanding in Agentic Thinking-with-Videos

    arXiv:2602.07801v4 Announce Type: replace-cross Abstract: In long-video understanding, conventional uniform frame sampling often fails to capture key visual evidence, leading to degraded performance and increased hallucinations. To address this, recent agentic thinking-with-video…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    MAVEN: A Multi-stage Agentic Annotation Pipeline for Video Reasoning Tasks

    Training Vision Language Models (VLMs) for video event reasoning requires high-quality structured annotations capturing not only what happened, but when, where, why, and with what consequence, at a scale manual labelling cannot support. We present MAVEN (Multi-stage Agentic Video…

  3. arXiv cs.CV TIER_1 English(EN) · Han Zhang, Wanting Jiang, Tomasz Kornuta, Tian Zheng, Vidya Murali ·

    MAVEN: A Multi-stage Agentic Annotation Pipeline for Video Reasoning Tasks

    arXiv:2605.21917v1 Announce Type: new Abstract: Training Vision Language Models (VLMs) for video event reasoning requires high-quality structured annotations capturing not only what happened, but when, where, why, and with what consequence, at a scale manual labelling cannot supp…