A new research paper, "Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City," investigates how well visual language models (VLMs) and human drivers generalize to new geographic locations in autonomous driving scenarios. The study utilized dashcam footage from Lima and New York City, posing questions across factual, ratings, counterfactual, and reasoning categories to human drivers from both cities and various VLMs. Findings indicate that while human and VLM responses differ based on question type, neither humans nor VLMs showed significant performance variations modulated by geography, possibly due to the highly out-of-distribution nature of the test cases. AI
IMPACT This research highlights the challenges in generalizing VLM performance for autonomous driving across diverse geographic locations, suggesting further work is needed for robust real-world application.
RANK_REASON Research paper detailing a new benchmark for VLM and human performance in autonomous driving scenarios. [lever_c_demoted from research: ic=1 ai=1.0]
Read on Hugging Face Daily Papers →
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →