Gradient Descent with Large Step Size Redistributes Signals in Deep Networks

By PulseAugur Editorial · [1 sources] · 2026-06-05 04:00

Researchers have demonstrated that discrete Gradient Descent with a large step size leads to a different outcome than Gradient Flow in deep linear networks with multiple pathways. While Gradient Flow predicts a "winner-takes-all" scenario where features concentrate in single pathways, this study shows that large-step Gradient Descent can cause signals to redistribute across pathways. This re-balancing phase, occurring at the Edge of Stability, favors shared representations over persistent single-pathway dominance, clarifying how network depth influences pathway competition. AI

IMPACT Clarifies how discrete gradient descent dynamics can lead to shared representations, contrasting with theoretical predictions of pathway specialization.

RANK_REASON The cluster contains an academic paper detailing a new theoretical finding in machine learning. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Hee-Sung Kim, Sungyoon Lee · 2026-06-05 04:00

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

arXiv:2606.05219v1 Announce Type: new Abstract: Recent analyses of multi-pathway Deep Linear Networks use Gradient Flow to predict a "winner-takes-all" specialization in which path symmetry breaks and each feature concentrates in a single pathway. In this work, we show that discr…

COVERAGE [1]

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

RELATED ENTITIES

RELATED TOPICS