Researchers develop new algorithm for optimal subdata selection in machine learning

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-27 01:12

Researchers have developed a new methodology for selecting optimal subsets of data when dealing with large datasets or expensive labeling. This approach, based on optimal approximate design theory, aims to retain maximal information for parameter estimation in parametric models. The novel algorithm is designed to be general, accommodating various dataset sizes and optimality criteria, and includes a proof of convergence. The methodology also provides bounds for assessing the efficiency of selected subdata, demonstrating superior performance compared to existing methods. AI

影响 Offers a more efficient approach to data selection for machine learning tasks, potentially reducing computational costs and improving model training.

排序理由 Academic paper detailing a new methodology for subdata selection.

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv stat.ML TIER_1 English(EN) · Min Yang, Wei Zheng, John Stufken, Ming-Chung Chang, Ting Tian, Xueqin Wang · 2026-04-28 04:00

Nearly Optimal Subdata Selection

arXiv:2604.23930v1 Announce Type: cross Abstract: When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for furth…
arXiv stat.ML TIER_1 English(EN) · Xueqin Wang · 2026-04-27 01:12

Nearly Optimal Subdata Selection

When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for further consideration. A central question for selecting…

报道来源 [2]

Nearly Optimal Subdata Selection

Nearly Optimal Subdata Selection

相关实体

相关话题