English(EN) Position: The Systemic Lack of Agency in Visual Reasoning

论文：视觉语言模型在推理中缺乏能动性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

一篇新论文认为，当前的视觉语言模型（VLMs）存在系统性的能动性缺失，阻碍了它们的隐式推理能力。作者提出，VLMs倾向于执行被动的语义检索，而不是人类视觉理解所必需的主动、情境化推理。为解决此问题，他们引入了视觉隐式推理诊断基准（V-IRD）来衡量这一缺失的象限，发现即使是主流的VLMs在自主视觉探索和关注自我导向的探究方面也存在困难。 AI

影响强调了当前VLMs的一个关键差距，可能指导未来研究朝着更自主、更具探索性的AI系统发展。

排序理由该集群包含一篇介绍新模型评估基准的学术论文。 [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Yizhao Huang, Haoyang Chen, Shiqin Wang, Pohsun Huang, Jiayuan Li, Haoyuan Du, Yandong Shi, Zheng Wang, Zhixiang Wang · 2026-06-16 04:00

Position: The Systemic Lack of Agency in Visual Reasoning

arXiv:2606.14795v1 Announce Type: new Abstract: This paper argues that a systemic lack of Agency constrains the implicit reasoning capabilities of current Vision-Language Models (VLMs). Implicit reasoning refers to the ability to autonomously discover and utilize hidden visual ev…

报道来源 [1]

Position: The Systemic Lack of Agency in Visual Reasoning

相关实体

相关话题