Brief

last 24h

[2/2] 223 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.LG English(EN) · 7h

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

Researchers have demonstrated that discrete Gradient Descent with a large step size leads to a different outcome than Gradient Flow in deep linear networks with multiple pathways. While Gradient Flow predicts a "winner-takes-all" scenario where features concentrate in single pathways, this study shows that large-step Gradient Descent can cause signals to redistribute across pathways. This re-balancing phase, occurring at the Edge of Stability, favors shared representations over persistent single-pathway dominance, clarifying how network depth influences pathway competition. AI

IMPACT Clarifies how discrete gradient descent dynamics can lead to shared representations, contrasting with theoretical predictions of pathway specialization.
TOOL · arXiv cs.LG English(EN) · 7h

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction

Researchers have developed a theoretical framework to explain the phenomenon of "grokking" in machine learning, where a model fits training data and learns a generalizable rule at different rates. They propose the concept of "two training clocks" to distinguish the rapid decrease in classification loss from the slower simplification of the model's internal representation. This theory, initially demonstrated with deep linear networks, is then extended to explain similar behavior in ReLU MLPs, suggesting a two-stage learning process where the classifier adapts first, followed by representation refinement. AI

IMPACT Provides a theoretical explanation for a key aspect of model learning, potentially guiding future model development and training strategies.

Brief

Gradient Descent with Large Step Size Restores Symmetry in Deep Linear Networks with Multi-Pathway

Deciphering Two Training Clocks in Grokking via Deep Linear Network Theory with Conditional ReLU Reduction