Researchers have analyzed the evolution of Hessian eigenvectors during neural network training, revealing distinct behaviors between different optimizers. The study found that SGD tends to stabilize leading curvature directions over time, while Adam shows significant reorganization of these eigenvectors. Additionally, Adam exhibits a localization phenomenon where a small set of parameters disproportionately influences the leading curvature. AI
IMPACT Provides deeper insights into how optimizers like SGD and Adam affect neural network training, potentially guiding future algorithm development.
RANK_REASON The cluster contains an academic paper detailing novel research findings on neural network training dynamics.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →