New EgoProx benchmark tests MLLMs on 3D spatial reasoning

By PulseAugur Editorial · [1 sources] · 2026-05-26 04:00

Researchers have introduced EgoProx, a new benchmark designed to evaluate how well multimodal large language models (MLLMs) can understand and reason about 3D proximity from an egocentric perspective. The benchmark organizes tasks into a cognitive hierarchy, including intention, exploration, exploitation, and chain-of-actions reasoning, and utilizes an agent-based data engine for generating diverse question-answer pairs. While current MLLMs show some spatial knowledge, they still struggle to effectively apply it for visual question answering related to spatial reasoning. AI

IMPACT This benchmark will help researchers identify and improve MLLMs' capabilities in embodied 3D spatial reasoning, crucial for real-world applications.

RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Jinzhao Li, Yinuo Chen, Dongxu Piao, Panwang Pan, Yifan Yu, Dong Wang, Honglei Yan, Liang Yue, Shaofei Wang, Yixin Chen, Siyuan Huang, Miao Liu · 2026-05-26 04:00

EgoProx: Evaluating MLLMs on Egocentric 3D Proximity Reasoning Across a Cognitive Hierarchy

arXiv:2605.24456v1 Announce Type: new Abstract: Humans constantly reason about 3D proximity, the relations between their body and surrounding objects, to guide perception and action in daily life. Whether multimodal large language models (MLLMs) can perform such embodied 3D reaso…

COVERAGE [1]

EgoProx: Evaluating MLLMs on Egocentric 3D Proximity Reasoning Across a Cognitive Hierarchy

RELATED ENTITIES

RELATED TOPICS