English(EN) AsgardBench: A benchmark for visually grounded interactive planning

微软研究院推出AsgardBench，以测试AI代理的视觉规划适应性

作者 PulseAugur 编辑部 · [1 个来源] · 2026-03-26 19:02

微软研究院推出了AsgardBench，这是一个旨在评估具身AI代理根据视觉反馈调整其计划能力的新基准测试。该基准测试包含12种类型的108个任务实例，要求代理根据视觉观察在任务进行过程中修改其动作。AsgardBench隔离了交互式规划的关键能力，通过模拟需要调整计划的环境变化场景，超越了简单的感知或导航测试。 AI

排序理由提交内容描述了一个用于评估AI代理的新基准测试，属于“研究”类别。

在 Microsoft Research 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

Microsoft Research TIER_1 English(EN) · Andrea Tupini, Lars Liden, Reuben Tan, Yu Wang, Jianfeng Gao · 2026-03-26 19:02

AsgardBench: A benchmark for visually grounded interactive planning

<p>Imagine a robot tasked with cleaning a kitchen. It needs to observe its environment, decide what to do, and adjust when things don’t go as expected, for example, when the mug it was tasked to wash is already clean, or the sink is full of other items. This is the domain o…

报道来源 [1]

AsgardBench: A benchmark for visually grounded interactive planning

相关话题