Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway
Researchers have demonstrated that discrete Gradient Descent with a large step size leads to a different outcome than Gradient Flow in deep linear networks with multiple pathways. While Gradient Flow predicts a "winner-takes-all" scenario where features concentrate in single pathways, this study shows that large-step Gradient Descent can cause signals to redistribute across pathways. This re-balancing phase, occurring at the Edge of Stability, favors shared representations over persistent single-pathway dominance, clarifying how network depth influences pathway competition. AI
IMPACT Clarifies how discrete gradient descent dynamics can lead to shared representations, contrasting with theoretical predictions of pathway specialization.