Microsoft Research has introduced AsgardBench, a new benchmark designed to evaluate the ability of embodied AI agents to adapt their plans based on visual feedback. The benchmark consists of 108 task instances across 12 types, requiring agents to revise their actions as tasks progress based on visual observations. AsgardBench isolates the crucial capability of interactive planning, moving beyond simple perception or navigation tests by simulating scenarios where environmental changes necessitate plan adjustments. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The submission describes a new benchmark for evaluating AI agents, which falls under the 'research' category.