RescueBench: Can Embodied Agents Save Lives in the Wild ?
Researchers have introduced RescueBench, a new benchmark designed to evaluate embodied agents in realistic search-and-rescue scenarios. The benchmark simulates a four-stage pipeline, including exploration, target rescue, memory-guided return, and handoff, to assess how failures compound in complex workflows. Current baseline agents struggle significantly, with autonomous exploration identified as the primary failure mode and spatial memory as a secondary bottleneck. AI
IMPACT This benchmark could drive progress in embodied AI for complex, real-world applications like disaster response.