Researchers have demonstrated that discrete Gradient Descent with a large step size leads to a different outcome than Gradient Flow in deep linear networks with multiple pathways. While Gradient Flow predicts a "winner-takes-all" scenario where features concentrate in single pathways, this study shows that large-step Gradient Descent can cause signals to redistribute across pathways. This re-balancing phase, occurring at the Edge of Stability, favors shared representations over persistent single-pathway dominance, clarifying how network depth influences pathway competition. AI
IMPACT Clarifies how discrete gradient descent dynamics can lead to shared representations, contrasting with theoretical predictions of pathway specialization.
RANK_REASON The cluster contains an academic paper detailing a new theoretical finding in machine learning. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →