PulseAugur
实时 21:51:27
English(EN) On the global convergence of gradient descent for wide shallow models with bounded nonlinearities

新论文分析神经网络中梯度下降的收敛性

两篇新研究论文探讨了梯度下降在神经网络训练中的收敛性质。第一篇论文侧重于具有有界非线性的宽浅模型,证明了非全局最小值是不稳定的,从而确保在某些条件下梯度下降收敛到全局最小值。第二篇论文分析了满足 Polak-Lojasiewicz (PL) 条件的函数的随机梯度下降,证明了即使在非凸情况下,其渐近收敛速率也与强凸二次函数的收敛速率相匹配。 AI

影响 这些理论分析有助于加深对基于梯度的优化方法为何能有效训练复杂机器学习模型的理解,并可能指导未来的算法开发。

排序理由 两篇在 arXiv 上发表的学术论文,讨论了机器学习中使用的优化算法的理论方面。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新论文分析神经网络中梯度下降的收敛性

报道来源 [2]

  1. arXiv cs.LG TIER_1 English(EN) · Gabriel Peyré ·

    On the global convergence of gradient descent for wide shallow models with bounded nonlinearities

    A surprising phenomenon in the training of neural networks is the ability of gradient descent to find global minimizers of the training loss despite its non-convexity. Following earlier works, we investigate this behavior for wide shallow networks. Existing results essentially co…

  2. arXiv stat.ML TIER_1 English(EN) · Thomas Kruse ·

    Optimal Asymptotic Rates for (Stochastic) Gradient Descent under the Local PL-Condition: A Geometric Approach

    Stochastic gradient descent (SGD) has been studied extensively over the past decades due to its simplicity and broad applicability in machine learning. In this work, we analyze the local behavior of gradient descent and stochastic gradient descent for minimizing $C^2$-functions t…