Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

CleanPatrick: A Benchmark for Image Data Cleaning

Researchers have introduced CleanPatrick, a new benchmark designed to evaluate image data cleaning techniques. This benchmark, built on a large dermatology dataset, addresses the limitations of existing methods by incorporating real-world noise and human annotations. CleanPatrick formalizes data cleaning as a ranking task and has been used to benchmark various existing methods, revealing that self-supervised representations are effective for detecting near-duplicates, while detecting label errors remains a challenge. AI

IMPACT Provides a standardized evaluation for data cleaning methods, potentially improving the robustness of future AI models trained on image data.

Fitzpatrick17k
Fabian Gröger
CleanPatrick