PulseAugur
实时 01:40:01
English(EN) A Theory of Generalization in Deep Learning

新理论探讨预训练和稀疏连接如何增强深度学习泛化能力

三篇新论文探讨了深度学习泛化能力的理论基础。其中一篇论文将预训练确定为弱到强泛化能力的关键因素,并通过预训练过程中的相变展示了其出现。另一篇研究了卷积网络中的稀疏连接如何通过处理低维块中的输入来提高泛化能力,为它们的优势提供了原则性解释。第三篇论文提出了一个非渐近理论,通过展示神经切线核如何划分输出空间、管理信号和噪声来解释泛化能力,并引入了一个提高训练效率和性能的实用目标。 AI

影响 这些理论上的进步为理解和改进模型泛化能力提供了新框架,有望带来更强大、更高效的AI系统。

排序理由 该集群包含多篇在arXiv上发表的学术论文,重点关注深度学习泛化能力的理论方面。

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

新理论探讨预训练和稀疏连接如何增强深度学习泛化能力

报道来源 [4]

  1. arXiv cs.LG TIER_1 English(EN) · Wei Yao, Wang Zhaoyang, Gengze Xu, Chen Qian, Dongrui Liu, Ziqiao Wang, Yong Liu, Yunbei Xu ·

    On the Blessing of Pre-training in Weak-to-Strong Generalization

    arXiv:2605.05710v1 Announce Type: new Abstract: The paradigm of Weak-to-Strong Generalization (W2SG) suggests that a pre-trained strong model can surpass its weak supervisor, yet the decisive role of pre-training remains theoretically and empirically under-explored. In this work,…

  2. arXiv cs.LG TIER_1 English(EN) · Tongtong Liang, Esha Singh, Rahul Parhi, Alexander Cloninger, Yu-Xiang Wang ·

    Does Sparse Connectivity Improve Generalization? Convolutional Networks Below the Edge of Stability

    arXiv:2603.04807v2 Announce Type: replace-cross Abstract: Gradient descent on overparameterized neural networks typically operates at the Edge of Stability (EoS), where the largest Hessian eigenvalue hovers around a step-size-dependent threshold. We study how sparse connectivity …

  3. arXiv stat.ML TIER_1 English(EN) · Elon Litman, Gabe Guo ·

    A Theory of Generalization in Deep Learning

    arXiv:2605.01172v1 Announce Type: cross Abstract: We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal d…

  4. arXiv stat.ML TIER_1 English(EN) · Gabe Guo ·

    A Theory of Generalization in Deep Learning

    We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal dimensions corresponding to noise, the kernel's nea…