PulseAugur
LIVE 22:20:34
research · [2 sources] ·
4
research

New optimizers respect neural network symmetries, improve training

Researchers have introduced a new principle for designing optimizers in deep learning that aligns with the inherent symmetries of neural network architectures. Unlike current optimizers like Adam, which operate on parameters in a coordinate-wise manner, the proposed symmetry-compatible optimizers are designed to be equivariant to the specific symmetry groups of different weight blocks. This approach has been applied to various components such as embeddings, LM heads, MLPs, and MoE routers, yielding novel update rules. Experiments on language models demonstrate that these new optimizers consistently improve validation loss and training stability compared to standard AdamW. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT Introduces novel optimizer designs that improve training stability and final validation loss for language models.

RANK_REASON The cluster contains an academic paper detailing a new theoretical principle and experimental validation for optimizer design in deep learning.

Read on arXiv stat.ML →

COVERAGE [2]

  1. arXiv stat.ML TIER_1 · Tim Tsz-Kit Lau, Weijie Su ·

    Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

    arXiv:2605.18106v1 Announce Type: cross Abstract: A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its var…

  2. arXiv stat.ML TIER_1 · Weijie Su ·

    Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

    A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its variants operate inherently coordinate-wise, renderin…