CleanPatrick: A Benchmark for Image Data Cleaning
Researchers have introduced CleanPatrick, a new benchmark designed to evaluate image data cleaning techniques. This benchmark, built on a large dermatology dataset, addresses the limitations of existing methods by incorporating real-world noise and human annotations. CleanPatrick formalizes data cleaning as a ranking task and has been used to benchmark various existing methods, revealing that self-supervised representations are effective for detecting near-duplicates, while detecting label errors remains a challenge. AI
IMPACT Provides a standardized evaluation for data cleaning methods, potentially improving the robustness of future AI models trained on image data.