This paper investigates feature learning in two-layer neural networks with a linear width, examining the impact of two gradient descent steps compared to one. The research provides a detailed spectral characterization of updated weights, revealing they form a spiked random matrix with multiple learned directions. It highlights that reusing batches allows for capturing directions beyond a single information exponent, a benefit that extends to high-dimensional limits. AI
影响 Provides a mathematical framework for understanding optimization and feature learning in overparameterized networks.
排序理由 Academic paper published on arXiv detailing theoretical advancements in neural network feature learning.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →