On the Superlinear Relationship between SGD Noise Covariance and Loss Landscape Curvature
Researchers have uncovered a new relationship between the noise introduced by Stochastic Gradient Descent (SGD) and the curvature of the loss landscape in deep learning models. Their findings indicate that this noise is not directly proportional to the Hessian of the loss, as previously assumed under specific conditions. Instead, the study reveals a more general connection where the SGD noise covariance is related to the expected value of per-sample Hessians, suggesting these two factors approximately commute rather than coincide. AI
IMPACT Provides a more accurate theoretical understanding of SGD noise and its interaction with loss landscape curvature, potentially guiding future optimization algorithm development.