New theory reveals optimal learning rate schedules for deep learning

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a theoretical framework for optimal learning rate schedules in deep learning, specifically analyzing a random feature model trained with stochastic gradient descent. The study identifies two distinct training regimes: an 'easy phase' where optimal schedules follow a polynomial decay, and a 'hard phase' characterized by an initial constant learning rate followed by annealing. This theoretical model outperforms existing benchmarks for learning rates and batch sizes, offering insights into how LR transferability depends on model and task structures, with experimental validation on ResNet and GPT-2 models. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Provides a theoretical foundation for optimizing deep learning training, potentially leading to more efficient model development.

RANK_REASON Academic paper detailing theoretical advancements in deep learning training methodologies. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv stat.ML →

paper
other

COVERAGE [1]

arXiv stat.ML TIER_1 · Blake Bordelon, Francesco Mori · 2026-05-11 04:00

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

arXiv:2602.04774v2 Announce Type: replace-cross Abstract: Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR sched…

COVERAGE [1]

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

RELATED ENTITIES

RELATED TOPICS