English(EN) Spiking the training data to correct for test set contamination

新方法通过在训练数据中加入“扰动”来修复机器学习测试集污染

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-26 04:00

研究人员提出了一种名为“扰动”（spiking）的新颖方法来解决机器学习评估中的测试集污染问题。该技术涉及有意地在训练数据中引入已知污染水平，从而校准记忆预测器。然后，可以使用这些预测器来统计纠正被夸大的测试分数，从而提供一种确保模型性能评估更准确的原则性方法。 AI

影响通过纠正受污染的测试数据，提供了一种统计方法来确保更可靠的机器学习模型评估。

排序理由该集群包含一篇学术论文，详细介绍了解决机器学习评估中特定问题的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Johnny Tian-Zheng Wei, Jerry Li, Ameya Godbole, Robin Jia · 2026-05-26 04:00

Spiking the training data to correct for test set contamination

arXiv:2605.24818v1 Announce Type: cross Abstract: The literature on test set contamination largely focuses on detection, but the correction of contaminated test scores is underexplored. Our core proposal is to spike the training data by intentionally contaminating some test examp…