PulseAugur
EN
LIVE 13:22:08

New optimizers promise faster, more memory-efficient AI model training

Two new research papers introduce novel optimization techniques for deep learning models. The first paper, "Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization," proposes Hyperball, an optimizer wrapper that maintains performance gains with increasing model size by fixing weight matrix norms. The second paper, "OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality," presents OptEMA, an adaptive EMA optimizer that achieves near-optimal rates in zero-noise scenarios without manual hyperparameter tuning. A third paper, "Gefen: Optimized Stochastic Optimizer," introduces Gefen, a memory-efficient optimizer that reduces AdamW's memory footprint by approximately 8x while maintaining performance, enabling larger batch sizes and potentially larger models. AI

IMPACT These new optimization techniques could lead to faster training times and enable the development of larger, more complex AI models by reducing memory constraints.

RANK_REASON Multiple arXiv papers detailing new optimization techniques for deep learning models.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 5 sources. How we write summaries →

COVERAGE [5]

  1. arXiv cs.LG TIER_1 English(EN) · Kaiyue Wen, Xingyu Dang, Kaifeng Lyu, Tengyu Ma, Percy Liang ·

    Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization

    arXiv:2606.16899v1 Announce Type: new Abstract: Matrix based optimizers such as Muon can substantially speed up language model pretraining, but their gains over AdamW are observed to shrink as model size and data scale grow when using standard constant decoupled weight decay. We …

  2. arXiv cs.LG TIER_1 English(EN) · Ganzhao Yuan ·

    OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality

    arXiv:2603.09923v4 Announce Type: replace Abstract: Exponential moving averages (EMAs) are a central component of widely used adaptive optimizers such as Adam. However, existing analyses of Adam-style methods often yield suboptimal guarantees in the zero-noise regime, rely on ope…

  3. arXiv cs.LG TIER_1 English(EN) · Percy Liang ·

    Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization

    Matrix based optimizers such as Muon can substantially speed up language model pretraining, but their gains over AdamW are observed to shrink as model size and data scale grow when using standard constant decoupled weight decay. We propose Hyperball, a simple optimizer wrapper th…

  4. arXiv cs.AI TIER_1 English(EN) · Nadav Benedek, Tomer Koren, Ohad Fried ·

    Gefen: Optimized Stochastic Optimizer

    arXiv:2606.13894v1 Announce Type: cross Abstract: AdamW is a default optimizer for modern deep learning, but its first and second moment states add roughly two parameter-sized buffers to training memory. We propose Gefen, a memory-efficient optimizer that automatically shares sec…

  5. arXiv cs.CL TIER_1 English(EN) · Ohad Fried ·

    Gefen: Optimized Stochastic Optimizer

    AdamW is a default optimizer for modern deep learning, but its first and second moment states add roughly two parameter-sized buffers to training memory. We propose Gefen, a memory-efficient optimizer that automatically shares second-moment estimates across parameter blocks and q…