English(EN) Nearly Optimal Subdata Selection

研究人员开发新的机器学习最优子数据选择算法

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-27 01:12

研究人员开发了一种新的方法，用于在处理大型数据集或昂贵的标注时选择最优数据子集。该方法基于最优近似设计理论，旨在为参数模型中的参数估计保留最大信息。该新算法设计通用，可适应各种数据集大小和最优性标准，并包含收敛性证明。该方法还提供了评估所选子数据效率的界限，与现有方法相比，表现更优。 AI

影响为机器学习任务提供更有效的数据选择方法，可能降低计算成本并改善模型训练。

排序理由详细介绍子数据选择新方法的学术论文。

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv stat.ML TIER_1 English(EN) · Min Yang, Wei Zheng, John Stufken, Ming-Chung Chang, Ting Tian, Xueqin Wang · 2026-04-28 04:00

Nearly Optimal Subdata Selection

arXiv:2604.23930v1 Announce Type: cross Abstract: When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for furth…
arXiv stat.ML TIER_1 English(EN) · Xueqin Wang · 2026-04-27 01:12

Nearly Optimal Subdata Selection

When, in terms of the number of data points, the size of a dataset exceeds available computing resources, or when labeling is expensive, an attractive solution consists of selecting only some of the data points (subdata) for further consideration. A central question for selecting…