PulseAugur
实时 09:43:20

MAVEN pipeline 自动化视频推理数据标注

研究人员开发了 MAVEN,一个旨在自动化创建高质量结构化视频推理任务标注的代理式(agentic)流水线。该流水线能够合成多尺度事件描述,并支持代理驱动的领域自适应,使其能够在无人干预的情况下重新设计提示和流水线结构。MAVEN 已用于标注超过 5,300 个交通视频,并且在这些数据上微调名为 Cosmos-Reason2-8B 的模型,其性能在特定评估集上超越了 Gemini 2.5 Pro 和 3.1 Flash。 AI

影响 自动化视频数据标注,可能加速 VLM 训练并提高复杂推理任务的性能。

排序理由 该集群描述了一篇关于视频推理任务自动化标注流水线的新研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · Wenqi Liu, Yunxiao Wang, Shijie Ma, Meng Liu, Qile Su, Tianke Zhang, Haonan Fan, Changyi Liu, Kaiyu Jiang, Jiankang Chen, Kaiyu Tang, Bin Wen, Fan Yang, Tingting Gao, Han Li, Yinwei Wei, Xuemeng Song ·

    VideoTemp-o3:在具身视频思维中协调时间接地与视频理解

    arXiv:2602.07801v4 Announce Type: replace-cross Abstract: In long-video understanding, conventional uniform frame sampling often fails to capture key visual evidence, leading to degraded performance and increased hallucinations. To address this, recent agentic thinking-with-video…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    MAVEN:视频推理任务的多阶段代理标注流水线

    Training Vision Language Models (VLMs) for video event reasoning requires high-quality structured annotations capturing not only what happened, but when, where, why, and with what consequence, at a scale manual labelling cannot support. We present MAVEN (Multi-stage Agentic Video…

  3. arXiv cs.CV TIER_1 English(EN) · Han Zhang, Wanting Jiang, Tomasz Kornuta, Tian Zheng, Vidya Murali ·

    MAVEN:视频推理任务的多阶段代理标注流水线

    arXiv:2605.21917v1 Announce Type: new Abstract: Training Vision Language Models (VLMs) for video event reasoning requires high-quality structured annotations capturing not only what happened, but when, where, why, and with what consequence, at a scale manual labelling cannot supp…