PulseAugur
实时 06:42:39

Paper explores preconditioned gradient descent's impact on neural network learning regimes

This paper investigates how preconditioned gradient descent (PGD) methods, like Gauss-Newton, influence spectral bias and the phenomenon of grokking in neural networks. Researchers propose that PGD can mitigate spectral bias, which typically causes networks to learn low frequencies first, potentially hindering the capture of fine-scale structures. The study suggests that PGD can also reduce delays associated with grokking, a delayed generalization effect hypothesized to occur during the transition from the Neural Tangent Kernel (NTK) to a feature-rich learning regime. Experimental results support the idea that grokking represents this transitional behavior, with PGD enabling more uniform exploration of the parameter space. AI

影响 Deepens understanding of neural network training dynamics, potentially leading to more efficient learning algorithms for complex tasks.

排序理由 Academic paper on theoretical and empirical results of preconditioned gradient descent on neural network convergence behavior. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Paper explores preconditioned gradient descent's impact on neural network learning regimes

报道来源 [1]

  1. arXiv cs.LG TIER_1 English(EN) · Shuai Jiang, Alexey Voronin, Eric Cyr, Ben Southworth ·

    On the Convergence Behavior of Preconditioned Gradient Descent Toward the Rich Learning Regime

    arXiv:2601.03162v2 Announce Type: replace Abstract: Spectral bias, the tendency of neural networks to learn low frequencies first, can be both a blessing and a curse. While it enhances the generalization capabilities by suppressing high-frequency noise, it can be a limitation in …