Researchers have introduced FOGO, a novel optimizer designed to address gradient interference during model training, which can lead to both short-term and long-term forgetting. FOGO works by spectrally orthogonalizing momentum updates and using a compact codebook memory to detect and resolve conflicts with past update directions. This approach aims to prevent dominant gradients from suppressing valuable but less frequent update directions, thereby improving knowledge retention and convergence. FOGO has demonstrated superior performance over standard optimizers like Adam and Muon across various tasks, including class-imbalanced classification, continual learning, and fine-tuning large language models. AI
IMPACT Introduces a new method to improve model training efficiency and knowledge retention, potentially benefiting various AI applications.
RANK_REASON This is a research paper detailing a new optimization algorithm for machine learning models. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →