PulseAugur
LIVE 09:52:02
research · [2 sources] ·
1
research

New research refines Adam optimizer's memory and noise dynamics

Two new research papers explore the nuances of the Adam optimizer, a popular tool in deep learning. The first paper proposes a "refresh rule" for Adam's momentum parameter, suggesting it should scale with training data size to optimize performance and robustness across different scales. The second paper delves into how mini-batch noise, influenced by batch size and Adam's hyperparameters, affects the optimizer's implicit bias and generalization capabilities, particularly in multi-epoch training scenarios. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT These studies offer theoretical insights and practical tuning strategies for the Adam optimizer, potentially improving model training efficiency and generalization across various deep learning tasks.

RANK_REASON Two academic papers published on arXiv discussing theoretical and experimental aspects of the Adam optimizer.

Read on arXiv cs.LG →

COVERAGE [2]

  1. arXiv cs.LG TIER_1 · Enrique S. Quintana-Ortí ·

    Scaling the Memory of Balanced Adam

    Recent evidence suggests that Adam performs robustly when its momentum parameters are tied, $β_1=β_2$, reducing the optimizer to a single remaining parameter. However, the value of this parameter is still poorly understood. We argue that, in balanced Adam, $β$ should not be treat…

  2. arXiv stat.ML TIER_1 · Matias D. Cattaneo, Boris Shigida ·

    The Effect of Mini-Batch Noise on the Implicit Bias of Adam

    arXiv:2602.01642v2 Announce Type: replace-cross Abstract: With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token p…