PulseAugur
实时 06:24:53

Researchers discover hidden failure modes in Adam optimizer for continual learning

Researchers have identified a hidden failure mode when gradient modification techniques are combined with the Adam optimizer in continual learning scenarios. This issue, particularly prevalent with shared-routing projection methods, can lead to significant performance degradation, causing models to forget previously learned information. The problem stems from Adam's second-moment pathway, which can inflate effective learning rates when gradients are modified. A proposed solution, adaptive decoupled moment routing, routes modified gradients to the first moment while preserving second-moment statistics, successfully preventing performance collapse across various methods and scales. AI

影响 Identifies a critical failure mode in common continual learning setups, potentially impacting model robustness and requiring re-evaluation of existing methods.

排序理由 Academic paper detailing a novel failure mode and a proposed solution in continual learning.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Researchers discover hidden failure modes in Adam optimizer for continual learning

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song ·

    Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair

    arXiv:2604.22407v1 Announce Type: new Abstract: Many continual-learning methods modify gradients upstream (e.g., projection, penalty rescaling, replay mixing) while treating Adam as a neutral backend. We show this composition has a hidden failure mode. In a high-overlap, non-adap…

  2. arXiv cs.AI TIER_1 English(EN) · Li Song ·

    Hidden Failure Modes of Gradient Modification under Adam in Continual Learning, and Adaptive Decoupled Moment Routing as a Repair

    Many continual-learning methods modify gradients upstream (e.g., projection, penalty rescaling, replay mixing) while treating Adam as a neutral backend. We show this composition has a hidden failure mode. In a high-overlap, non-adaptive 8-domain continual LM, all shared-routing p…