English(EN) Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

新理论揭示深度学习最优学习率调度

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-11 04:00

研究人员开发了一个深度学习最优学习率调度的理论框架，专门分析了使用随机梯度下降训练的随机特征模型。该研究确定了两种不同的训练模式：一个“简单阶段”，其中最优调度遵循多项式衰减；以及一个“困难阶段”，其特点是初始学习率恒定，然后进行退火。该理论模型在学习率和批次大小方面优于现有基准，并提供了关于学习率可迁移性如何依赖于模型和任务结构的见解，并在ResNet和GPT-2模型上进行了实验验证。 AI

影响为优化深度学习训练提供了理论基础，可能导致更高效的模型开发。

排序理由详细介绍深度学习训练方法论理论进展的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv stat.ML TIER_1 English(EN) · Blake Bordelon, Francesco Mori · 2026-05-11 04:00

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

arXiv:2602.04774v2 Announce Type: replace-cross Abstract: Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR sched…

报道来源 [1]

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

相关实体

相关话题