New benchmark reveals MLLMs struggle with spatial reasoning

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-12 17:11

Researchers have developed PCSR-Bench, a new benchmark designed to evaluate the spatial reasoning capabilities of Multimodal Large Language Models (MLLMs) when processing omnidirectional images. The benchmark, comprising over 84,000 question-answer pairs, reveals a significant performance gap in MLLMs, with accuracy plummeting on complex tasks like egocentric rotation and compositional reasoning. However, experiments using reinforcement learning on a 7B-scale model indicate that spatial reasoning abilities are not entirely immutable and can be improved through targeted optimization, though gains are task-specific and sensitive to reward design. AI

影响 Highlights a key bottleneck in MLLMs, suggesting targeted optimization can improve spatial reasoning capabilities.

排序理由 The cluster describes a new academic paper introducing a diagnostic benchmark for evaluating MLLMs. [lever_c_demoted from research: ic=1 ai=1.0]

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

New benchmark reveals MLLMs struggle with spatial reasoning

报道来源 [1]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-12 17:11

Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images

Multimodal Large Language Models (MLLMs) show strong visual perception, yet remain limited in reasoning about space under changing viewpoints. We study this challenge as Perspective-Conditioned Spatial Reasoning (PCSR) in 360-degree omnidirectional images, where broad scene cover…

报道来源 [1]

Beyond Localization: A Comprehensive Diagnosis of Perspective-Conditioned Spatial Reasoning in MLLMs from Omnidirectional Images

相关实体

相关话题