PulseAugur
实时 16:56:09
English(EN) V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

V-Zero框架支持无标签视觉推理,提高训练速度

研究人员推出了一种新颖的细粒度视觉推理框架V-Zero,该框架无需标注的答案标签即可运行。该方法利用对比证据门控来增强模型识别任务相关视觉证据和将推理 grounding 到特定图像区域的能力。V-Zero通过将问题相关的图像裁剪与负面视觉视图配对以评估和门控蒸馏,实现了显著更快的训练时间,据报道比有监督微调快5倍以上,比强化学习基线快10倍以上。 AI

影响 这种无标签的方法可以显著降低训练视觉推理模型的成本和时间。

排序理由 该集群描述了一篇详细介绍新颖视觉推理框架的研究论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

V-Zero框架支持无标签视觉推理,提高训练速度

报道来源 [2]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

    A novel label-free framework for visual reasoning called V-Zero is presented, which uses contrastive evidence gating to improve fine-grained visual reasoning without requiring annotated answer labels, achieving faster training than traditional methods.

  2. arXiv cs.CV TIER_1 English(EN) · Haoxiang Sun, Zhihang Yi, Langxuan Deng, Yuhao Zhou, Peiqi Jia, Jian Zhao, Li Yuan, Jiancheng Lv, Tao Wang ·

    V-Zero: Answer-Label-Free On-Policy Distillation with Contrastive Evidence Gating for Fine-Grained Visual Reasoning

    arXiv:2606.25319v1 Announce Type: new Abstract: Fine-grained visual reasoning requires multimodal large language models (MLLMs) to identify task-relevant visual evidence and ground their reasoning in local image regions. Existing agentic methods typically rely on reinforcement le…