PulseAugur
EN
LIVE 09:52:09

Open problem: AdamW optimizer's effectiveness under heavy-tailed noise in LLMs

A recent paper poses an open problem regarding the effectiveness of the AdamW optimizer in training large language models (LLMs) under heavy-tailed noise conditions. While AdamW is widely used, its theoretical understanding is limited to finite-variance scenarios, despite empirical evidence suggesting heavy-tailed noise is common in LLM pretraining. The paper explores whether AdamW can converge in this regime, contrasting it with other optimizers like Lion and Muon that have shown convergence under heavy-tailed noise, and provides a weighted-metric benchmark and a lower-bound mechanism. AI

IMPACT Clarifies theoretical limitations of a widely used LLM training optimizer, potentially guiding future research into more robust methods.

RANK_REASON The cluster contains an academic paper detailing an open problem in machine learning optimization. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv stat.ML →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Open problem: AdamW optimizer's effectiveness under heavy-tailed noise in LLMs

COVERAGE [1]

  1. arXiv stat.ML TIER_1 English(EN) · Lijun Zhang ·

    Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

    AdamW is the de facto optimizer for training large language models (LLMs), yet the theory behind it still lives mostly in finite-variance regimes. This is increasingly unsatisfying, as empirical evidence indicates that stochastic gradient noise in LLM pretraining is typically hea…