Humans and VLMs show similar driving generalization across cities

By PulseAugur Editorial · [1 sources] · 2026-06-18 00:00

A new research paper, "Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City," investigates how well visual language models (VLMs) and human drivers generalize to new geographic locations in autonomous driving scenarios. The study utilized dashcam footage from Lima and New York City, posing questions across factual, ratings, counterfactual, and reasoning categories to human drivers from both cities and various VLMs. Findings indicate that while human and VLM responses differ based on question type, neither humans nor VLMs showed significant performance variations modulated by geography, possibly due to the highly out-of-distribution nature of the test cases. AI

IMPACT This research highlights the challenges in generalizing VLM performance for autonomous driving across diverse geographic locations, suggesting further work is needed for robust real-world application.

RANK_REASON Research paper detailing a new benchmark for VLM and human performance in autonomous driving scenarios. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Hugging Face Daily Papers →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Humans and VLMs show similar driving generalization across cities

COVERAGE [1]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-18 00:00

Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City

Research examines how self-driving car systems and humans perform on visual question answering tasks across different geographic locations, revealing that both human and AI responses diverge based on question types but show similar performance regardless of location.

COVERAGE [1]

Robusto-2: Benchmarking Humans & VLMs for Autonomous Driving in Lima & New York City

RELATED ENTITIES

RELATED TOPICS