PulseAugur
实时 13:27:51
English(EN) Gefen: Optimized Stochastic Optimizer

新的优化器有望实现更快、更节省内存的 AI 模型训练

两篇新的研究论文介绍了用于深度学习模型的新型优化技术。第一篇论文《Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization》提出了 Hyperball,这是一种优化器包装器,通过固定权重矩阵范数来在模型规模不断增大的情况下保持性能提升。第二篇论文《OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality》提出了 OptEMA,这是一种自适应 EMA 优化器,在零噪声情况下无需手动调整超参数即可达到接近最优的速率。第三篇论文《Gefen: Optimized Stochastic Optimizer》介绍了 Gefen,这是一种内存效率高的优化器,可将 AdamW 的内存占用减少约 8 倍,同时保持性能,从而能够使用更大的批次大小和可能更大的模型。 AI

影响 这些新的优化技术有望通过减少内存限制来缩短训练时间,并支持更大、更复杂的 AI 模型的发展。

排序理由 多篇 arXiv 论文详细介绍了深度学习模型的新优化技术。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 5 个来源。 我们如何撰写摘要 →

报道来源 [5]

  1. arXiv cs.LG TIER_1 English(EN) · Kaiyue Wen, Xingyu Dang, Kaifeng Lyu, Tengyu Ma, Percy Liang ·

    Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization

    arXiv:2606.16899v1 Announce Type: new Abstract: Matrix based optimizers such as Muon can substantially speed up language model pretraining, but their gains over AdamW are observed to shrink as model size and data scale grow when using standard constant decoupled weight decay. We …

  2. arXiv cs.LG TIER_1 English(EN) · Ganzhao Yuan ·

    OptEMA: Adaptive Exponential Moving Average for Stochastic Optimization with Zero-Noise Optimality

    arXiv:2603.09923v4 Announce Type: replace Abstract: Exponential moving averages (EMAs) are a central component of widely used adaptive optimizers such as Adam. However, existing analyses of Adam-style methods often yield suboptimal guarantees in the zero-noise regime, rely on ope…

  3. arXiv cs.LG TIER_1 English(EN) · Percy Liang ·

    Fantastic Pretraining Optimizers and Where to Find Them II: Hyperball Optimization

    Matrix based optimizers such as Muon can substantially speed up language model pretraining, but their gains over AdamW are observed to shrink as model size and data scale grow when using standard constant decoupled weight decay. We propose Hyperball, a simple optimizer wrapper th…

  4. arXiv cs.AI TIER_1 English(EN) · Nadav Benedek, Tomer Koren, Ohad Fried ·

    Gefen: Optimized Stochastic Optimizer

    arXiv:2606.13894v1 Announce Type: cross Abstract: AdamW is a default optimizer for modern deep learning, but its first and second moment states add roughly two parameter-sized buffers to training memory. We propose Gefen, a memory-efficient optimizer that automatically shares sec…

  5. arXiv cs.CL TIER_1 English(EN) · Ohad Fried ·

    Gefen:优化的随机优化器

    AdamW is a default optimizer for modern deep learning, but its first and second moment states add roughly two parameter-sized buffers to training memory. We propose Gefen, a memory-efficient optimizer that automatically shares second-moment estimates across parameter blocks and q…