This paper investigates how preconditioned gradient descent (PGD) methods, like Gauss-Newton, influence spectral bias and the phenomenon of grokking in neural networks. Researchers propose that PGD can mitigate spectral bias, which typically causes networks to learn low frequencies first, potentially hindering the capture of fine-scale structures. The study suggests that PGD can also reduce delays associated with grokking, a delayed generalization effect hypothesized to occur during the transition from the Neural Tangent Kernel (NTK) to a feature-rich learning regime. Experimental results support the idea that grokking represents this transitional behavior, with PGD enabling more uniform exploration of the parameter space. AI
影响 Deepens understanding of neural network training dynamics, potentially leading to more efficient learning algorithms for complex tasks.
排序理由 Academic paper on theoretical and empirical results of preconditioned gradient descent on neural network convergence behavior. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →