Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics
A new paper challenges the common assumption that Stochastic Gradient Descent (SGD) noise behaves like Brownian motion. Researchers propose an alternative model where SGD dynamics occur within a fluctuating loss landscape caused by minibatch sampling. This framework reveals distinct behaviors for SGD near critical points, particularly showing that variance can grow over time in nearly-flat directions, indicating effective diffusion. AI
IMPACT Challenges a fundamental assumption in AI training dynamics, potentially leading to more nuanced optimization strategies and better understanding of model convergence.