A new research paper explores feature learning in wide two-layer neural networks under the Maximal Update Parametrization ($\mu$P). The study establishes four key structural results, including the global existence and uniqueness of the mean-field limit for noisy gradient descent. It also characterizes the identifiability of this limit and demonstrates that the active support of the long-time limit measure admits a sparse-dictionary decomposition under specific conditions. The research further decomposes the total feature-learning error into several components, offering a detailed analysis of the learning process. AI
IMPACT This research provides theoretical insights into the feature learning capabilities of wide neural networks, potentially informing future model architectures and training methodologies.
RANK_REASON The cluster contains an academic paper detailing theoretical research in machine learning.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →