The Role of Symmetry in Optimizing Overparameterized Networks
A new paper analyzes how overparameterization in neural networks aids optimization by introducing additional symmetries. These symmetries act as a form of preconditioning on the Hessian, leading to better-conditioned minima. Furthermore, overparameterization increases the likelihood of finding global minima near typical initializations, making them more accessible. Experiments with teacher-student networks confirmed these theoretical predictions, showing improved convergence and condition numbers with increased network width. AI
IMPACT Provides a theoretical framework for understanding how network width impacts optimization and convergence.