Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

A Link between Shock-wave Theory and Symmetry-reduced Stochastic Gradient Descent for Artificial Neural Networks

Researchers have established a mathematical connection between shock-wave theory and the learning dynamics of stochastic gradient descent in artificial neural networks. By applying principles from differential geometry, Lie group theory, and fluid mechanics, they demonstrated that the effective dynamics of these networks can be described by a viscous Hamilton--Jacobi equation on a quotient manifold. Furthermore, the coarse-grained loss function's gradient follows a Burgers-type equation, indicating that shock formation is rigorously possible. This framework has been applied to various architectures, including multilayer perceptrons, convolutional neural networks, Transformers, and mean-field networks, suggesting potential for new diagnostics in deep learning. AI

IMPACT This theoretical framework could lead to novel diagnostics for monitoring and controlling deep learning training phases.