PulseAugur
EN
LIVE 11:47:27
tool · [1 source] ·

New SemiPrune method prunes datasets with minimal labeled data

Researchers have developed a new method called SemiPrune for efficiently pruning large datasets used in deep learning. This technique addresses the limitation of existing methods that require fully labeled data, which is often costly to obtain. SemiPrune leverages a small set of labeled data combined with semi-supervised learning to generate pseudo-labels for unlabeled data, enabling the application of supervised pruning methods. The approach estimates example difficulty from the training dynamics derived from these pseudo-labels, leading to more accurate coreset selection and state-of-the-art performance on various specialized datasets. AI

Summary written by gemini-2.5-flash-lite from 1 sources. How we write summaries →

IMPACT Offers a more cost-effective way to prepare large datasets for deep learning training, potentially accelerating research and development by reducing computational and storage requirements.

RANK_REASON This is a research paper detailing a new method for dataset pruning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

COVERAGE [1]

  1. arXiv cs.LG TIER_1 · Yeseul Cho, Baekrok Shin, Changmin Kang, Chulhee Yun ·

    Label-Efficient Dataset Pruning via Semi-Supervised Pseudo-Labeling

    arXiv:2605.23198v1 Announce Type: new Abstract: Dataset pruning reduces the storage and training costs of deep learning by selecting an informative subset from a large dataset. However, most existing pruning methods require fully labeled data, which limits their applicability in …