PulseAugur
LIVE 12:26:49
research · [1 source] ·
0
research

GuideDog dataset aids blind and low-vision navigation with egocentric multimodal data

Researchers have introduced GuideDog, a new dataset designed to aid the development of multimodal large language models (MLLMs) for blind and low-vision (BLV) individuals. The dataset comprises 22,000 image-description pairs from real-world pedestrian scenes across 46 countries, utilizing a human-AI pipeline for more scalable annotation. Additionally, GuideDogQA, an 818-sample benchmark, evaluates object recognition and depth perception, with current MLLMs showing limitations in these areas. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT This dataset could accelerate the development of assistive navigation tools for the visually impaired by providing much-needed real-world data.

RANK_REASON The cluster describes a new academic paper introducing a dataset and benchmark.

Read on arXiv cs.CV →

COVERAGE [1]

  1. arXiv cs.CV TIER_1 · Junhyeok Kim, Jaewoo Park, Junhee Park, Sangeyl Lee, Jiwan Chung, Jisung Kim, Ji Hoon Joung, Youngjae Yu ·

    GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance

    arXiv:2503.12844v2 Announce Type: replace Abstract: For people affected by blindness and low vision (BLV), safe and independent navigation remains a major challenge, impacting over 2.2 billion individuals worldwide. Although multimodal large language models (MLLMs) offer new oppo…