New framework GH-ESD uses LLMs to find vision model errors · arXiv research

By PulseAugur Editorial · [1 sources] · 2026-06-19 04:00

Researchers have developed GH-ESD, a new framework for discovering systematic failures in instance-level vision tasks like object detection and segmentation. Unlike previous methods that focused on image-level classification, GH-ESD uses Large Language Models (LLMs) and Vision-Language Models (VLMs) to generate and verify hypotheses about relational and spatially grounded visual patterns that cause errors. The framework also introduces GESD, a new benchmark dataset designed for instance-level error slice discovery, which has shown improved performance over existing methods. AI

IMPACT Introduces a novel method for identifying and understanding failures in vision models, potentially leading to more robust AI systems.

RANK_REASON Academic paper introducing a new method and dataset for computer vision research. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New framework GH-ESD uses LLMs to find vision model errors · arXiv research

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Wei Zhang, Chaoqun Wang, Zixuan Guan, Sam Kao, Pengfei Zhao, Peng Wu, Sifeng He · 2026-06-19 04:00

GH-ESD: Grounded Hypothesis-Driven Error Slice Discovery for Instance-Level Vision Tasks

arXiv:2512.24592v2 Announce Type: replace Abstract: Systematic failures of vision models on semantically coherent subsets, known as error slices, reveal limitations in robustness and evaluation. Existing slice discovery approaches largely model slices as clusters in representatio…

COVERAGE [1]

GH-ESD: Grounded Hypothesis-Driven Error Slice Discovery for Instance-Level Vision Tasks

RELATED ENTITIES

RELATED TOPICS