FOGO: Forgetting-aware Orthogonalization Optimizer
Researchers have introduced FOGO, a novel optimizer designed to combat forgetting during AI model training. FOGO addresses both short-term forgetting at each training step and long-term forgetting common in continual learning by detecting and resolving gradient interference. The optimizer uses spectral orthogonalization and a compact codebook memory to preserve past update directions, demonstrating improved convergence and knowledge retention across various tasks, including fine-tuning LLaVA-7B and pretraining GPT-2, outperforming existing optimizers like Adam and Muon. AI
IMPACT FOGO's ability to reduce forgetting could lead to more efficient and effective AI model training, particularly in continual learning scenarios.