Researchers have developed a novel method to understand the loss geometry of deep neural networks by analyzing worker disagreement in Local Stochastic Gradient Descent (SGD). This disagreement, theoretically shown to be influenced by gradient noise and Hessian curvature, provides a cost-effective, Hessian-free estimator of the dominant subspace of the loss landscape. Experiments with MLPs, CNNs, and Transformers confirm that the subspaces identified through worker-average gaps effectively capture the gradient components within the dominant Hessian eigenspace. AI
RANK_REASON This is a research paper detailing a new method for analyzing deep neural network loss geometry. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →