A Link between Shock-wave Theory and Symmetry-reduced Stochastic Gradient Descent for Artificial Neural Networks
Researchers have established a mathematical connection between shock-wave theory and the learning dynamics of stochastic gradient descent in artificial neural networks. By applying principles from differential geometry, Lie group theory, and fluid mechanics, they demonstrated that the effective dynamics of these networks can be described by a viscous Hamilton--Jacobi equation on a quotient manifold. Furthermore, the coarse-grained loss function's gradient follows a Burgers-type equation, indicating that shock formation is rigorously possible. This framework has been applied to various architectures, including multilayer perceptrons, convolutional neural networks, Transformers, and mean-field networks, suggesting potential for new diagnostics in deep learning. AI
IMPACT This theoretical framework could lead to novel diagnostics for monitoring and controlling deep learning training phases.
- transformers
- artificial neural network
- deep learning
- convolutional neural network
- stochastic gradient descent
- multilayer perceptron
- differential geometry
- Shock-wave theory for rupture of rubber
- Lie group theory of the bessel equation of the first kind of integral order
- fluid mechanics
- Hamilton--Jacobi Equations and Distance Functions on Riemannian Manifolds
- Burgers-type equation
- Mean-field Networks