New AirGroundBench benchmark probes spatial intelligence in MLLMs

By PulseAugur Editorial · [2 sources] · 2026-06-26 12:51

Researchers have introduced AirGroundBench, a new benchmark designed to evaluate the spatial intelligence of multimodal large language models (MLLMs) in collaborative air-ground scenarios. The benchmark addresses limitations in existing evaluations by focusing on heterogeneous views, scale mismatches, and reference-frame inconsistencies inherent in combined unmanned aerial vehicle (UAV) and unmanned ground vehicle (UGV) operations. Evaluations using AirGroundBench reveal that while current MLLMs perform adequately on basic spatial perception, they struggle significantly with cross-view alignment and transformation-intensive reasoning, impacting their sequential decision-making capabilities in vision-language navigation tasks. AI

IMPACT Highlights key limitations in current MLLMs' spatial reasoning, guiding future research towards more robust embodied AI.

RANK_REASON The cluster describes a new research benchmark for evaluating AI models.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New AirGroundBench benchmark probes spatial intelligence in MLLMs

COVERAGE [2]

arXiv cs.CV TIER_1 English(EN) · Haotian Li, Yida Wang, Leyuan Wang, Jinshan Lai, Keyang Wang, Zonghao Guo, Qiang Ma, Liuyu Xiang, Jianwei Hu, Zhaofeng He · 2026-06-29 04:00

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration

arXiv:2606.28049v1 Announce Type: new Abstract: In recent years, multimodal large language models (MLLMs) have shown strong potential for embodied intelligence, yet their ability to maintain geometrically consistent spatial understanding across heterogeneous views remains under-e…
arXiv cs.CV TIER_1 English(EN) · Zhaofeng He · 2026-06-26 12:51

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration

In recent years, multimodal large language models (MLLMs) have shown strong potential for embodied intelligence, yet their ability to maintain geometrically consistent spatial understanding across heterogeneous views remains under-evaluated. Existing benchmarks largely focus on s…

COVERAGE [2]

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration

AirGroundBench: Probing Spatial Intelligence in Multimodal Large Models under Heterogeneous Multi-View Embodied Collaboration

RELATED ENTITIES

RELATED TOPICS