PulseAugur
EN
LIVE 23:44:13

New research refines Adam optimizer's memory and noise dynamics

Two new research papers explore the nuances of the Adam optimizer, a popular tool in deep learning. The first paper proposes a "refresh rule" for Adam's momentum parameter, suggesting it should scale with training data size to optimize performance and robustness across different scales. The second paper delves into how mini-batch noise, influenced by batch size and Adam's hyperparameters, affects the optimizer's implicit bias and generalization capabilities, particularly in multi-epoch training scenarios. AI

IMPACT These studies offer theoretical insights and practical tuning strategies for the Adam optimizer, potentially improving model training efficiency and generalization across various deep learning tasks.

RANK_REASON Two academic papers published on arXiv discussing theoretical and experimental aspects of the Adam optimizer.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New research refines Adam optimizer's memory and noise dynamics

COVERAGE [2]

  1. arXiv cs.LG TIER_1 English(EN) · Enrique S. Quintana-Ortí ·

    Scaling the Memory of Balanced Adam

    Recent evidence suggests that Adam performs robustly when its momentum parameters are tied, $β_1=β_2$, reducing the optimizer to a single remaining parameter. However, the value of this parameter is still poorly understood. We argue that, in balanced Adam, $β$ should not be treat…

  2. arXiv stat.ML TIER_1 English(EN) · Matias D. Cattaneo, Boris Shigida ·

    The Effect of Mini-Batch Noise on the Implicit Bias of Adam

    arXiv:2602.01642v2 Announce Type: replace-cross Abstract: With limited high-quality data and growing compute, multi-epoch training is gaining back its importance across sub-areas of deep learning. Adam(W), versions of which are go-to optimizers for many tasks such as next token p…