PulseAugur
EN
LIVE 10:30:23

Local SGD Worker Disagreement Reveals Deep Neural Network Loss Geometry

Researchers have developed a novel method to understand the loss geometry of deep neural networks by analyzing worker disagreement in Local Stochastic Gradient Descent (SGD). This disagreement, theoretically shown to be influenced by gradient noise and Hessian curvature, provides a cost-effective, Hessian-free estimator of the dominant subspace of the loss landscape. Experiments with MLPs, CNNs, and Transformers confirm that the subspaces identified through worker-average gaps effectively capture the gradient components within the dominant Hessian eigenspace. AI

RANK_REASON This is a research paper detailing a new method for analyzing deep neural network loss geometry. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Local SGD Worker Disagreement Reveals Deep Neural Network Loss Geometry

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Tolga Dimlioglu, Kristi Topollai, Anna Choromanska ·

    Worker Disagreement Reveals Sharp Directions in Local SGD

    arXiv:2605.27739v1 Announce Type: cross Abstract: Deep neural network training often exhibits highly anisotropic loss geometry, where a few sharp dominant Hessian directions coexist with a large flatter bulk. Gradients tend to align disproportionately with these dominant directio…