Researchers explore data symmetries to improve noisy dataset selection for ML

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-05 04:00

Researchers have developed a new method to identify optimal subsets of training data, particularly when dealing with label noise. This approach leverages data symmetries and invariance properties to improve the accuracy of k-nearest neighbors (k-NN) in selecting low-noise samples. The findings suggest that exploiting these underlying symmetries can lead to performance comparable to training on noise-free datasets, even in high-dimensional settings. AI

影响 Improves robustness of models trained on potentially noisy real-world datasets.

排序理由 Academic paper detailing a novel method for data selection in machine learning. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Kumar Shubham, Pavan Karjol, Kiran M K, Prathosh AP · 2026-05-05 04:00

Leveraging Data Symmetries to Select an Optimal Subset of Training Data under Label Noise

arXiv:2605.01874v1 Announce Type: new Abstract: The performance of machine learning models often relies on large labeled datasets; however, data collected from diverse sources can contain label noise. Recent work has shown that, in noisy settings, there may exist a subset of the …

报道来源 [1]

Leveraging Data Symmetries to Select an Optimal Subset of Training Data under Label Noise

相关实体

相关话题