High-dimensional Limit of SGD for Diagonal Linear Networks
A new paper details the high-dimensional behavior of stochastic gradient descent (SGD) on diagonal linear networks. The research shows that in high dimensions, SGD dynamics can be accurately modeled by a stochastic differential equation. This allows for the derivation of a deterministic partial differential equation that tracks key statistics like risk and curvature, ultimately demonstrating exponential convergence to zero risk. AI
IMPACT Provides theoretical insights into the optimization of neural network components, potentially informing future model training strategies.