PulseAugur
实时 11:05:47
English(EN) Dissecting Embodied Abilities in Multimodal Language Models through Skill-level Evaluation and Diagnosis

新基准揭示感知、时空建模是MLLM的弱项

研究人员推出BEAR,一个旨在评估和诊断具身多模态大语言模型(MLLM)技能级能力的新基准。该基准将具身任务分解为14个不同的原子技能,比以往的任务级评估提供了更细粒度的模型故障洞察。在BEAR上的评估显示,感知限制和不稳定的时空建模是当前MLLM的重要瓶颈。为解决这些问题,团队开发了BEAR-Agent,一个通过视觉和空间推理工具增强MLLM的对话代理,在基准测试和机器人实验中均展示了显著的性能提升。 AI

影响 识别具身AI的关键弱点,指导未来研究朝着改进机器人代理的感知和时空推理能力方向发展。

排序理由 该集群包含一篇学术论文,介绍了一个新的多模态语言模型基准和评估框架。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

报道来源 [1]

  1. arXiv cs.CV TIER_1 English(EN) · Yu Qi, Haibo Zhao, Ziyu Guo, Siyuan Ma, Ziyan Chen, Yaokun Han, Renrui Zhang, Zitiantao Lin, Yizhe Zhu, Shiji Xin, Yijian Huang, Boce Hu, Kai Cheng, Peiheng Wang, Jiazheng Liu, Jiayi Zhang, Yizhe Zhu, Wenqing Wang, Yiran Qin, Haojie Huang, Lawson L. S. W… ·

    Dissecting Embodied Abilities in Multimodal Language Models through Skill-level Evaluation and Diagnosis

    arXiv:2510.08759v2 Announce Type: replace Abstract: Understanding the capability bottlenecks of embodied multimodal large language models (MLLMs) is crucial for improving embodied agents. However, existing embodied benchmarks mainly focus on task-level evaluation and fail to prov…