PulseAugur
实时 05:36:44
English(EN) ERGeoBench:A Comprehensive Benchmark for Embodied Reasoning and Geo-localization in Multimodal Large Language Models

新基准ERGeoBench测试LLM的具身地理定位能力

研究人员推出了ERGeoBench,这是一个旨在评估多模态大语言模型(MLLMs)作为具身代理时的地理定位能力的新基准。该基准利用超过2200张街景全景图,在单视图、全景视图和具身视图设置下评估模型。评估表明,尽管当前的MLLMs能够掌握高级地理概念,但在精确的度量定位和跨不同视图保持空间一致性方面仍面临挑战,这凸显了整合感知和推理的必要性。 AI

影响 为具身AI代理提供标准化评估,推动空间推理和地理定位领域的发展。

排序理由 该集群包含一篇介绍AI模型评估新基准的研究论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Kaiwen Xue, Tao Wei, Guoxin Zhang, Zhonghong Ou, Kaoyan Lu, Yu Feng, Yifan Zhu, Haoran Luo ·

    ERGeoBench:多模态大语言模型具身推理与地理定位的综合基准

    arXiv:2605.31251v1 Announce Type: cross Abstract: Multimodal large language models (MLLMs) have shown strong potential as embodied agents, yet embodied geo-localization remains underexplored due to the lack of fine-grained evaluation. We introduce ERGeoBench, a diagnostic benchma…

  2. arXiv cs.AI TIER_1 English(EN) · Haoran Luo ·

    ERGeoBench:多模态大语言模型具身推理与地理定位的综合基准

    Multimodal large language models (MLLMs) have shown strong potential as embodied agents, yet embodied geo-localization remains underexplored due to the lack of fine-grained evaluation. We introduce ERGeoBench, a diagnostic benchmark for vision-driven embodied geo-localization. ER…