Weight decay optimization in neural networks requires overparametrization and specific initialization

By PulseAugur Editorial · [1 sources] · 2026-06-30 04:00

A new research paper explores the theoretical underpinnings of neural network optimization when using weight decay. The study demonstrates that a benign loss landscape, free from spurious local minima, is achieved in two-layer ReLU networks under significant overparametrization. This benign landscape is primarily relevant in a large initialization regime, as smaller initializations can still lead to convergence in spurious local minima despite the overall landscape's favorable properties. AI

IMPACT Provides theoretical insights into optimizing neural networks, potentially guiding future model development and training strategies.

RANK_REASON Academic paper detailing theoretical findings on neural network optimization. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

Etienne Boursier

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Weight decay optimization in neural networks requires overparametrization and specific initialization

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Etienne Boursier, Matthew Bowditch, Matthias Englert, Ranko Lazic · 2026-06-30 04:00

Favorability of Loss Landscape with Weight Decay Requires Both Large Overparametrization and Initialization

arXiv:2505.22578v2 Announce Type: replace Abstract: The optimization of neural networks under weight decay remains poorly understood from a theoretical standpoint. While weight decay is standard practice in modern training procedures, most theoretical analyses focus on unregulari…

COVERAGE [1]

Favorability of Loss Landscape with Weight Decay Requires Both Large Overparametrization and Initialization

RELATED TOPICS