PulseAugur
EN
LIVE 00:22:21

New research details incremental learning in overparameterized ReLU networks

Researchers have published a paper detailing the incremental learning process in mildly overparameterized ReLU networks trained on orthogonal data. The study proves that as initialization scale approaches zero, the gradient flow converges to a saddle-to-saddle jump process, leading to new neurons activating sequentially. This phenomenon allows networks to interpolate training data efficiently, with a width proportional to the logarithm of the number of samples. The research also establishes a novel implicit bias, showing the learned interpolator's squared L2-norm scales with the square root of the number of samples, closely approximating the minimal L2-norm interpolator. AI

IMPACT Provides theoretical grounding for understanding optimization dynamics in neural networks, potentially informing future model architectures and training strategies.

RANK_REASON The cluster contains an academic paper published on arXiv detailing theoretical research in machine learning.

Read on arXiv stat.ML →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New research details incremental learning in overparameterized ReLU networks

COVERAGE [2]

  1. arXiv stat.ML TIER_1 English(EN) · James Town, Etienne Boursier, Ben Lewis, Matthias Englert, Ranko Lazic ·

    Mildly Overparameterized ReLU Networks on Orthogonal Data: Incremental Learning and Implicit Bias

    arXiv:2605.27097v1 Announce Type: cross Abstract: The successful training of neural networks hinges on the use of first order optimization methods, yet the theoretical characterization of these methods remains incomplete. This is especially true in settings with mild overparamete…

  2. arXiv stat.ML TIER_1 English(EN) · Ranko Lazic ·

    Mildly Overparameterized ReLU Networks on Orthogonal Data: Incremental Learning and Implicit Bias

    The successful training of neural networks hinges on the use of first order optimization methods, yet the theoretical characterization of these methods remains incomplete. This is especially true in settings with mild overparameterization. In this work, we study the gradient flow…