English(EN) Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

通过重复使用较小的数据集来加速AI训练

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-19 17:28

研究人员发现，在AI模型训练过程中重复使用较小的数据集可以显著加快学习过程。这种被称为“大小差距”的现象，与使用较大的数据集相比，可以节省计算资源，并且现有理论尚未完全解释。研究表明，这种加速是由于采样偏差促进了逐层增长，而采样偏差在使用较小的数据集时更有效，为AI模型提供了一种积极的优化策略，尤其适用于推理任务。 AI

影响提出了一种优化AI训练的新方法，该方法可以降低计算成本并提高性能，尤其是在推理任务方面。

排序理由该集群包含一篇详细介绍AI训练方法新发现的学术论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Jingwen Liu, Ezra Edelman, Surbhi Goel, Bingbin Liu · 2026-05-22 04:00

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

arXiv:2605.20314v1 Announce Type: cross Abstract: This work investigates the ``small-vs-large gap'', where repeating on fewer samples can lead to compute saving during training compared to using a larger dataset. This is observed across algorithmic tasks, architectures and optimi…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-19 17:28

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

This work investigates the ``small-vs-large gap'', where repeating on fewer samples can lead to compute saving during training compared to using a larger dataset. This is observed across algorithmic tasks, architectures and optimizers and cannot be explained using prior theory. W…

报道来源 [2]

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

Less Data, Faster Training: repeating smaller datasets speeds up learning via sampling biases

相关实体

相关话题