PulseAugur
LIVE 17:25:08
research · [1 source] ·
0
research

LLM pretraining research finds learning rate decay wastes best data

Researchers have identified a critical incompatibility between curriculum-based LLM pretraining and standard learning rate decay schedules. Their findings suggest that decaying learning rates can negate the benefits of training on high-quality data in a specific order. The study proposes two strategies to mitigate this issue: using a more moderate learning rate decay or employing model averaging instead of decay. These methods improved benchmark scores by 1.64% over random shuffling on 1.5B-parameter models trained on 30B tokens. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Co-designing data curricula with optimization methods could unlock performance gains in LLM pretraining.

RANK_REASON Academic paper detailing a novel finding in LLM pretraining methodology.

Read on arXiv cs.CL →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 · Kairong Luo, Zhenbo Sun, Haodong Wen, Xinyu Shi, Jiarui Cui, Chenyi Dang, Kaifeng Lyu, Wenguang Chen ·

    How Learning Rate Decay Wastes Your Best Data in Curriculum-Based LLM Pretraining

    arXiv:2511.18903v2 Announce Type: replace-cross Abstract: Due to the scarcity of high-quality data, large language models (LLMs) are often trained on mixtures of data with varying quality levels, even after sophisticated data curation. A natural approach to better leverage high-q…