PulseAugur
实时 13:02:08

New MetaAdamW optimizer uses self-attention for adaptive learning rates

Researchers have developed MetaAdamW, a novel optimizer that enhances adaptive learning rates and weight decay by employing a self-attention mechanism. This Transformer-based approach dynamically adjusts hyperparameters for different parameter groups based on statistical features, aiming to overcome the limitations of uniform settings in optimizers like AdamW. Experiments across diverse tasks show MetaAdamW consistently outperforms AdamW, reducing training time or improving performance. AI

影响 Introduces a novel optimizer that could improve training efficiency and performance across various machine learning tasks.

排序理由 This is a research paper detailing a new optimization algorithm for machine learning models. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

New MetaAdamW optimizer uses self-attention for adaptive learning rates

报道来源 [1]

  1. arXiv cs.LG TIER_1 English(EN) · JiangBo Zhao, ZhaoXin Liu ·

    A Self-Attentive Meta-Optimizer with Group-Adaptive Learning Rates and Weight Decay

    arXiv:2605.04055v1 Announce Type: new Abstract: Adaptive optimizers like AdamW apply uniform hyperparameters across all parameter groups, ignoring heterogeneous optimization dynamics across layers and modules. We address this limitation by proposing MetaAdamW - a new optimizer th…