This paper investigates feature learning in two-layer neural networks with a linear width, examining the impact of two gradient descent steps compared to one. The research provides a detailed spectral characterization of updated weights, revealing they form a spiked random matrix with multiple learned directions. It highlights that reusing batches allows for capturing directions beyond a single information exponent, a benefit that extends to high-dimensional limits. AI
IMPACT Provides a mathematical framework for understanding optimization and feature learning in overparameterized networks.
RANK_REASON Academic paper published on arXiv detailing theoretical advancements in neural network feature learning.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →