PulseAugur
LIVE 01:47:55
research · [3 sources] ·
0
research

New research reveals gradient-direction sensitivity in optimizers for AI models

Researchers have identified a new method for analyzing how neural networks learn by examining loss gradients instead of optimizer updates. This approach, termed Gradient-Direction Sensitivity (GDS), reveals a stronger coupling between specific feature directions and linear centroids than previously observed. The study found that GDS significantly increases the measured coupling by one to two orders of magnitude, offering a clearer diagnostic of feature formation in parameter space. Furthermore, constraining attention updates to a rank-3 subspace using GDS accelerated model grokking by approximately 2.3 times. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT Introduces a novel diagnostic for understanding feature formation in neural networks, potentially improving training efficiency.

RANK_REASON This is a research paper detailing a new diagnostic method for analyzing neural network training.

Read on arXiv cs.LG →

COVERAGE [3]

  1. arXiv cs.LG TIER_1 · Yongzhong Xu ·

    Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

    arXiv:2604.25143v1 Announce Type: new Abstract: We show that replacing the rolling SVD of AdamW updates with a rolling SVD of loss gradients changes the diagnostic by 1-2 orders of magnitude. Performing SVD on the loss gradient instead of the AdamW update increases the measured p…

  2. arXiv cs.LG TIER_1 · Yongzhong Xu ·

    Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

    We show that replacing the rolling SVD of AdamW updates with a rolling SVD of loss gradients changes the diagnostic by 1-2 orders of magnitude. Performing SVD on the loss gradient instead of the AdamW update increases the measured perturbative coupling between SED directions and …

  3. Hugging Face Daily Papers TIER_1 ·

    Gradient-Direction Sensitivity Reveals Linear-Centroid Coupling Hidden by Optimizer Trajectories

    We show that replacing the rolling SVD of AdamW updates with a rolling SVD of loss gradients changes the diagnostic by 1-2 orders of magnitude. Performing SVD on the loss gradient instead of the AdamW update increases the measured perturbative coupling between SED directions and …