English(EN) MindEdit-Bench: Benchmarking Object-Level Counterfactual Spatial Reasoning in VLMs from In-the-Wild Photos

新基准MindEdit-Bench揭示VLM在反事实空间推理方面存在困难

作者 PulseAugur 编辑部 · [2 个来源] · 2026-07-01 06:19

研究人员推出了MindEdit-Bench，一个旨在评估视觉语言模型（VLM）物体级反事实空间推理能力的新基准。该基准使用通过智能手机拍摄的日常室内场景照片三元组，并采用自动流水线进行3D场景图提取。它包括探究感知和视角转换的任务，以及专注于空间编辑和跨视图可见性编辑的新任务，这些任务的正确答案不在输入图像中。对15个VLM的初步测试显示，与人类表现相比，准确率显著降低，突显了它们在进行反事实空间推理方面的巨大差距。 AI

影响突显了VLM能力的一个关键差距，可能指导未来研究朝着更强大的空间理解方向发展。

排序理由该集群描述了一个用于评估AI模型的新学术基准。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

MindEdit-Bench

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Leyuan Yu, Xiao Tang, Minghao Liu, Xinyuan Li, Xiaokai Bai, Sheng Zhou, Qunshu Lin, Weihao Xuan, Naoto Yokoya · 2026-07-02 04:00

MindEdit-Bench: Benchmarking Object-Level Counterfactual Spatial Reasoning in VLMs from In-the-Wild Photos

arXiv:2607.00491v1 Announce Type: cross Abstract: Benchmarks for vision-language models (VLMs) mostly test observational spatial reasoning: models describe relations already visible in the input. Existing what-if tasks typically vary the observer while keeping the scene fixed. Ca…
arXiv cs.AI TIER_1 English(EN) · Naoto Yokoya · 2026-07-01 06:19

MindEdit-Bench：在野外照片中对视觉语言模型（VLMs）进行对象级反事实空间推理的基准测试

Benchmarks for vision-language models (VLMs) mostly test observational spatial reasoning: models describe relations already visible in the input. Existing what-if tasks typically vary the observer while keeping the scene fixed. Can VLMs instead predict the consequences of hypothe…

报道来源 [2]

MindEdit-Bench: Benchmarking Object-Level Counterfactual Spatial Reasoning in VLMs from In-the-Wild Photos

MindEdit-Bench：在野外照片中对视觉语言模型（VLMs）进行对象级反事实空间推理的基准测试

相关实体

相关话题