Microsoft Research launches AsgardBench to test AI agents' visual planning adaptation

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Microsoft Research has introduced AsgardBench, a new benchmark designed to evaluate the ability of embodied AI agents to adapt their plans based on visual feedback. The benchmark consists of 108 task instances across 12 types, requiring agents to revise their actions as tasks progress based on visual observations. AsgardBench isolates the crucial capability of interactive planning, moving beyond simple perception or navigation tests by simulating scenarios where environmental changes necessitate plan adjustments. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON The submission describes a new benchmark for evaluating AI agents, which falls under the 'research' category.

Read on Microsoft Research →

paper
other

Microsoft Research launches AsgardBench to test AI agents' visual planning adaptation

COVERAGE [1]

Microsoft Research TIER_1 · Andrea Tupini, Lars Liden, Reuben Tan, Yu Wang, Jianfeng Gao · 2026-03-26 19:02

AsgardBench: A benchmark for visually grounded interactive planning

<p>Imagine a robot tasked with cleaning a kitchen. It needs to observe its environment, decide what to do, and adjust when things don’t go as expected, for example, when the mug it was tasked to wash is already clean, or the sink is full of other items. This is the domain o…

COVERAGE [1]

AsgardBench: A benchmark for visually grounded interactive planning

RELATED TOPICS