A new paper investigates the role of weight decay in deep learning training stability, challenging its common perception as a simple regularization technique. The research analyzes how weight decay affects parameter dynamics and loss sharpness at the "Edge of Stability," demonstrating that it effectively slows down progressive sharpening. The study also reveals an architecture-dependent phase transition, where weight decay dampens oscillations in CNNs but stabilizes sharpness below a theoretical boundary in MLPs, driven by the alignment of parameter vectors and sharpness gradients. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Investigates fundamental mechanisms of training stability, potentially leading to more robust and efficient deep learning model development.
RANK_REASON This is a research paper published on arXiv detailing novel findings about a machine learning technique.