New theory reveals optimal learning rate schedules for deep learning

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-11 04:00

Researchers have developed a theoretical framework for optimal learning rate schedules in deep learning, specifically analyzing a random feature model trained with stochastic gradient descent. The study identifies two distinct training regimes: an 'easy phase' where optimal schedules follow a polynomial decay, and a 'hard phase' characterized by an initial constant learning rate followed by annealing. This theoretical model outperforms existing benchmarks for learning rates and batch sizes, offering insights into how LR transferability depends on model and task structures, with experimental validation on ResNet and GPT-2 models. AI

影响 Provides a theoretical foundation for optimizing deep learning training, potentially leading to more efficient model development.

排序理由 Academic paper detailing theoretical advancements in deep learning training methodologies. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv stat.ML TIER_1 English(EN) · Blake Bordelon, Francesco Mori · 2026-05-11 04:00

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

arXiv:2602.04774v2 Announce Type: replace-cross Abstract: Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR sched…

报道来源 [1]

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

相关实体

相关话题