New papers analyze gradient descent convergence in neural networks

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Two new research papers explore the convergence properties of gradient descent in neural network training. The first paper, focusing on wide shallow models with bounded nonlinearities, proves that non-global minimizers are unstable, ensuring gradient descent converges to global minima under certain conditions. The second paper analyzes stochastic gradient descent for functions satisfying the Polyak-Lojasiewicz (PL) condition, demonstrating that its asymptotic convergence rate matches that of strongly convex quadratics, even in non-convex settings. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT These theoretical analyses advance the understanding of why gradient-based optimization methods are effective in training complex machine learning models, potentially guiding future algorithm development.

RANK_REASON Two academic papers published on arXiv discussing theoretical aspects of optimization algorithms used in machine learning.

Read on arXiv cs.LG →

paper
other

COVERAGE [2]

arXiv cs.LG TIER_1 · Gabriel Peyré · 2026-05-11 16:08

On the global convergence of gradient descent for wide shallow models with bounded nonlinearities

A surprising phenomenon in the training of neural networks is the ability of gradient descent to find global minimizers of the training loss despite its non-convexity. Following earlier works, we investigate this behavior for wide shallow networks. Existing results essentially co…
arXiv stat.ML TIER_1 · Thomas Kruse · 2026-05-14 10:18

Optimal Asymptotic Rates for (Stochastic) Gradient Descent under the Local PL-Condition: A Geometric Approach

Stochastic gradient descent (SGD) has been studied extensively over the past decades due to its simplicity and broad applicability in machine learning. In this work, we analyze the local behavior of gradient descent and stochastic gradient descent for minimizing $C^2$-functions t…

COVERAGE [2]

On the global convergence of gradient descent for wide shallow models with bounded nonlinearities

Optimal Asymptotic Rates for (Stochastic) Gradient Descent under the Local PL-Condition: A Geometric Approach

RELATED ENTITIES

RELATED TOPICS