GuideDog dataset aids blind and low-vision navigation with egocentric multimodal data

By PulseAugur Editorial · [1 sources] · 2026-05-01 04:00

Researchers have introduced GuideDog, a new dataset designed to aid the development of multimodal large language models (MLLMs) for blind and low-vision (BLV) individuals. The dataset comprises 22,000 image-description pairs from real-world pedestrian scenes across 46 countries, utilizing a human-AI pipeline for more scalable annotation. Additionally, GuideDogQA, an 818-sample benchmark, evaluates object recognition and depth perception, with current MLLMs showing limitations in these areas. AI

IMPACT This dataset could accelerate the development of assistive navigation tools for the visually impaired by providing much-needed real-world data.

RANK_REASON The cluster describes a new academic paper introducing a dataset and benchmark.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Junhyeok Kim, Jaewoo Park, Junhee Park, Sangeyl Lee, Jiwan Chung, Jisung Kim, Ji Hoon Joung, Youngjae Yu · 2026-05-01 04:00

GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance

arXiv:2503.12844v2 Announce Type: replace Abstract: For people affected by blindness and low vision (BLV), safe and independent navigation remains a major challenge, impacting over 2.2 billion individuals worldwide. Although multimodal large language models (MLLMs) offer new oppo…

COVERAGE [1]

GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance

RELATED ENTITIES

RELATED TOPICS