A new paper analyzes how overparameterization in neural networks aids optimization by introducing additional symmetries. These symmetries act as a form of preconditioning on the Hessian, leading to better-conditioned minima. Furthermore, overparameterization increases the likelihood of finding global minima near typical initializations, making them more accessible. Experiments with teacher-student networks confirmed these theoretical predictions, showing improved convergence and condition numbers with increased network width. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Provides a theoretical framework for understanding how network width impacts optimization and convergence.
RANK_REASON Academic paper on theoretical aspects of neural network optimization.