New benchmark reveals text-to-image models struggle with geographic street-view accuracy

By PulseAugur Editorial · [1 sources] · 2026-06-22 17:53

Researchers have developed GeoFidelity-Bench, a new benchmark designed to evaluate the geographic accuracy of text-to-image models when generating street-view images. The benchmark uses a curated dataset of 7,117 images from Mapillary, covering 109 specific road segments across 25 cities globally, sourced from OpenStreetMap data. Initial evaluations of six open-weight models show that providing street and neighborhood names improves retrieval accuracy by approximately 5.5 percentage points compared to city-only prompts, but the models still struggle to generate images that precisely match a specific road segment, indicating a gap between generating plausible local scenes and generating for a precise location. AI

IMPACT This benchmark highlights limitations in current text-to-image models for precise geographic generation, potentially guiding future research towards more location-aware synthesis.

RANK_REASON The cluster contains a research paper introducing a new benchmark for evaluating AI models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New benchmark reveals text-to-image models struggle with geographic street-view accuracy

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Siru Tao · 2026-06-22 17:53

GeoFidelity-Bench: Evaluating Segment-Level Geographic Fidelity in Text-to-Image Street-View Generation

Text-to-image models can generate visually plausible city streets, but whether their outputs correspond to a requested road segment rather than a generic city prior remains unclear. We introduce GeoFidelity-Bench, a reference-panel benchmark for segment-conditioned geographic fid…

COVERAGE [1]

GeoFidelity-Bench: Evaluating Segment-Level Geographic Fidelity in Text-to-Image Street-View Generation

RELATED ENTITIES

RELATED TOPICS