Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 1w

WildRoadBench: A Wild Aerial Road-Damage Grounding Benchmark for Vision-Language Models and Autonomous Agents

Researchers have introduced WildRoadBench, a new benchmark designed to evaluate vision-language models (VLMs) and LLM-driven agents in identifying road damage from aerial imagery. The benchmark includes two tracks: one for VLMs to localize damage using visual grounding and prompts, and another for autonomous agents to perform tasks like web searching and code generation within a limited budget. Current frontier models show promise but still fall short of reliable performance, with open-source models and agents lagging significantly behind. AI

IMPACT This benchmark could drive improvements in AI's ability to assess infrastructure damage from aerial data.

Vision-Language Models
Bingnan Liu
WildRoadBench
LLM-driven agents