New PowerStep optimizer halves memory use for large model training

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-11 10:36

Researchers have introduced PowerStep, a novel memory-efficient optimizer for training large neural networks. Unlike traditional adaptive optimizers like Adam that store gradient statistics, PowerStep achieves adaptivity by applying a nonlinear transform to a momentum buffer. This method halves the memory required for optimizers and, when combined with quantization, can reduce memory usage by approximately eight times compared to Adam, while maintaining comparable convergence speeds. AI

影响 Offers a more memory-efficient approach to training large models, potentially lowering hardware requirements and enabling larger-scale experiments.

排序理由 The cluster contains a new academic paper detailing a novel optimization method for machine learning. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Yonghong Tian · 2026-05-11 10:36

PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent

Adaptive optimizers, most notably Adam, have become the default standard for training large-scale neural networks such as Transformers. These methods maintain running estimates of gradient first and second moments, incurring substantial memory overhead. We introduce PowerStep, a …

报道来源 [1]

PowerStep: Memory-Efficient Adaptive Optimization via $\ell_p$-Norm Steepest Descent

相关实体

相关话题