New FOGO optimizer combats gradient interference and forgetting

By PulseAugur Editorial · [1 sources] · 2026-06-10 04:00

Researchers have introduced FOGO, a novel optimizer designed to address gradient interference during model training, which can lead to both short-term and long-term forgetting. FOGO works by spectrally orthogonalizing momentum updates and using a compact codebook memory to detect and resolve conflicts with past update directions. This approach aims to prevent dominant gradients from suppressing valuable but less frequent update directions, thereby improving knowledge retention and convergence. FOGO has demonstrated superior performance over standard optimizers like Adam and Muon across various tasks, including class-imbalanced classification, continual learning, and fine-tuning large language models. AI

IMPACT Introduces a new method to improve model training efficiency and knowledge retention, potentially benefiting various AI applications.

RANK_REASON This is a research paper detailing a new optimization algorithm for machine learning models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Toan Nguyen, Yang Liu, Trung Le, Celso de Melo, Flora D. Salim · 2026-06-10 04:00

FOGO: Forgetting-aware Orthogonalization Optimizer

arXiv:2606.10406v1 Announce Type: cross Abstract: We argue that forgetting is not confined to continual learning but is a general optimization phenomenon: during standard training, dominant mini-batch gradients suppress rare but useful update directions, causing short-term forget…

COVERAGE [1]

FOGO: Forgetting-aware Orthogonalization Optimizer

RELATED ENTITIES

RELATED TOPICS