Researchers explore data symmetries to improve noisy dataset selection for ML

By PulseAugur Editorial · [1 sources] · 2026-05-05 04:00

Researchers have developed a new method to identify optimal subsets of training data, particularly when dealing with label noise. This approach leverages data symmetries and invariance properties to improve the accuracy of k-nearest neighbors (k-NN) in selecting low-noise samples. The findings suggest that exploiting these underlying symmetries can lead to performance comparable to training on noise-free datasets, even in high-dimensional settings. AI

IMPACT Improves robustness of models trained on potentially noisy real-world datasets.

RANK_REASON Academic paper detailing a novel method for data selection in machine learning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Researchers explore data symmetries to improve noisy dataset selection for ML

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Kumar Shubham, Pavan Karjol, Kiran M K, Prathosh AP · 2026-05-05 04:00

Leveraging Data Symmetries to Select an Optimal Subset of Training Data under Label Noise

arXiv:2605.01874v1 Announce Type: new Abstract: The performance of machine learning models often relies on large labeled datasets; however, data collected from diverse sources can contain label noise. Recent work has shown that, in noisy settings, there may exist a subset of the …

COVERAGE [1]

Leveraging Data Symmetries to Select an Optimal Subset of Training Data under Label Noise

RELATED ENTITIES

RELATED TOPICS