WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents
Researchers have introduced WildRoadBench, a new benchmark designed to evaluate vision-language models (VLMs) and LLM-driven agents in identifying road damage from aerial imagery. The benchmark includes two tracks: one for VLMs to localize damage using visual grounding and prompts, and another for autonomous agents to perform tasks like web searching and code generation within a limited budget. Current frontier models show promise but still fall short of reliable performance, with open-source models and agents lagging significantly behind. AI
IMPACT This benchmark could drive improvements in AI's ability to assess infrastructure damage from aerial data.