PulseAugur
EN
LIVE 01:26:09

New research reveals gradient-direction sensitivity in optimizers for AI models

Researchers have identified a new method for analyzing how neural networks learn by examining loss gradients instead of optimizer updates. This approach, termed Gradient-Direction Sensitivity (GDS), reveals a stronger coupling between specific feature directions and linear centroids than previously observed. The study found that GDS significantly increases the measured coupling by one to two orders of magnitude, offering a clearer diagnostic of feature formation in parameter space. Furthermore, constraining attention updates to a rank-3 subspace using GDS accelerated model grokking by approximately 2.3 times. AI

IMPACT Introduces a novel diagnostic for understanding feature formation in neural networks, potentially improving training efficiency.

RANK_REASON This is a research paper detailing a new diagnostic method for analyzing neural network training.

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

New research reveals gradient-direction sensitivity in optimizers for AI models

COVERAGE [3]

  1. arXiv cs.LG TIER_1 English(EN) · Yongzhong Xu ·

    Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

    arXiv:2604.25143v1 Announce Type: new Abstract: We show that replacing the rolling SVD of AdamW updates with a rolling SVD of loss gradients changes the diagnostic by 1-2 orders of magnitude. Performing SVD on the loss gradient instead of the AdamW update increases the measured p…

  2. arXiv cs.LG TIER_1 English(EN) · Yongzhong Xu ·

    Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

    We show that replacing the rolling SVD of AdamW updates with a rolling SVD of loss gradients changes the diagnostic by 1-2 orders of magnitude. Performing SVD on the loss gradient instead of the AdamW update increases the measured perturbative coupling between SED directions and …

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

    We show that replacing the rolling SVD of AdamW updates with a rolling SVD of loss gradients changes the diagnostic by 1-2 orders of magnitude. Performing SVD on the loss gradient instead of the AdamW update increases the measured perturbative coupling between SED directions and …