New optimizers respect neural network symmetries, improve training

By PulseAugur Editorial · [2 sources] · 2026-05-18 09:17

Researchers have introduced a new principle for designing optimizers in deep learning that aligns with the inherent symmetries of neural network architectures. Unlike current optimizers like Adam, which operate on parameters in a coordinate-wise manner, the proposed symmetry-compatible optimizers are designed to be equivariant to the specific symmetry groups of different weight blocks. This approach has been applied to various components such as embeddings, LM heads, MLPs, and MoE routers, yielding novel update rules. Experiments on language models demonstrate that these new optimizers consistently improve validation loss and training stability compared to standard AdamW. AI

IMPACT Introduces novel optimizer designs that improve training stability and final validation loss for language models.

RANK_REASON The cluster contains an academic paper detailing a new theoretical principle and experimental validation for optimizer design in deep learning.

Read on arXiv stat.ML →

paper
infra

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New optimizers respect neural network symmetries, improve training

COVERAGE [2]

arXiv stat.ML TIER_1 English(EN) · Tim Tsz-Kit Lau, Weijie Su · 2026-05-19 04:00

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

arXiv:2605.18106v1 Announce Type: cross Abstract: A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its var…
arXiv stat.ML TIER_1 English(EN) · Weijie Su · 2026-05-18 09:17

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

A striking geometric disparity has long persisted in the practice of deep learning. While modern neural network architectures naturally exhibit rich symmetry and equivariance properties, popular optimizers such as Adam and its variants operate inherently coordinate-wise, renderin…

COVERAGE [2]

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

Symmetry-Compatible Principle for Optimizer Design: Embeddings, LM Heads, SwiGLU MLPs, and MoE Routers

RELATED ENTITIES

RELATED TOPICS