PulseAugur
EN
LIVE 09:41:10

New frameworks tackle corrupted labels in ML datasets

Two new research papers introduce frameworks for identifying and correcting corrupted labels in machine learning datasets. CANOLA and Relabeler both aim to improve model performance by refining noisy data, with CANOLA focusing on noise-aware learning and iterative soft label refinement, and Relabeler using local and global data relationships for detection and correction. Both methods demonstrate significant improvements over existing techniques in experiments, leading to better downstream task performance. AI

IMPACT Improved data quality from these frameworks could lead to more robust and accurate AI models across various applications.

RANK_REASON Two academic papers published on arXiv introduce novel methods for correcting corrupted labels in machine learning datasets.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Ha-Linh Nguyen, Hong-Anh Nguyen, Minh-Duc La, Phong Lam, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo ·

    Noise-Aware Framework for Correcting Corrupted Labels

    arXiv:2606.11695v1 Announce Type: cross Abstract: High-quality labeled data is essential for training reliable ML/DL models. However, real-world datasets often contain a considerable proportion of corrupted labels, which can severely degrade model performance. To address this pro…

  2. arXiv cs.LG TIER_1 English(EN) · Ha-Linh Nguyen, Hong-Anh Nguyen, Minh-Duc La, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo ·

    A Data-Centric Framework for Detecting and Correcting Corrupted Labels

    arXiv:2606.11699v1 Announce Type: new Abstract: The performance of machine learning and deep learning models largely depends on the quality of the training data. However, the quality of the real-world datasets is often compromised by noisy labels, which can substantially degrade …