Researchers have introduced a new principle for designing optimizers in deep learning that aligns with the inherent symmetries of neural network architectures. Unlike current optimizers like Adam, which operate on parameters in a coordinate-wise manner, the proposed symmetry-compatible optimizers are designed to be equivariant to the specific symmetry groups of different weight blocks. This approach has been applied to various components such as embeddings, LM heads, MLPs, and MoE routers, yielding novel update rules. Experiments on language models demonstrate that these new optimizers consistently improve validation loss and training stability compared to standard AdamW. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Introduces novel optimizer designs that improve training stability and final validation loss for language models.
RANK_REASON The cluster contains an academic paper detailing a new theoretical principle and experimental validation for optimizer design in deep learning.