PulseAugur
EN
LIVE 12:35:10

New theories explore how pre-training and sparse connectivity enhance deep learning generalization

Three new papers explore the theoretical underpinnings of generalization in deep learning. One paper identifies pre-training as a critical factor for weak-to-strong generalization, demonstrating its emergence through a phase transition during pre-training. Another investigates how sparse connectivity in convolutional networks can improve generalization by processing inputs in low-dimensional patches, offering a principled explanation for their advantage. The third paper presents a non-asymptotic theory explaining generalization by showing how the neural tangent kernel partitions output space, managing signal and noise, and introduces a practical objective that improves training efficiency and performance. AI

IMPACT These theoretical advancements offer new frameworks for understanding and improving model generalization, potentially leading to more robust and efficient AI systems.

RANK_REASON The cluster consists of multiple academic papers published on arXiv, focusing on theoretical aspects of deep learning generalization.

Read on arXiv stat.ML →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

New theories explore how pre-training and sparse connectivity enhance deep learning generalization

COVERAGE [4]

  1. arXiv cs.LG TIER_1 English(EN) · Wei Yao, Wang Zhaoyang, Gengze Xu, Chen Qian, Dongrui Liu, Ziqiao Wang, Yong Liu, Yunbei Xu ·

    On the Blessing of Pre-training in Weak-to-Strong Generalization

    arXiv:2605.05710v1 Announce Type: new Abstract: The paradigm of Weak-to-Strong Generalization (W2SG) suggests that a pre-trained strong model can surpass its weak supervisor, yet the decisive role of pre-training remains theoretically and empirically under-explored. In this work,…

  2. arXiv cs.LG TIER_1 English(EN) · Tongtong Liang, Esha Singh, Rahul Parhi, Alexander Cloninger, Yu-Xiang Wang ·

    Does Sparse Connectivity Improve Generalization? Convolutional Networks Below the Edge of Stability

    arXiv:2603.04807v2 Announce Type: replace-cross Abstract: Gradient descent on overparameterized neural networks typically operates at the Edge of Stability (EoS), where the largest Hessian eigenvalue hovers around a step-size-dependent threshold. We study how sparse connectivity …

  3. arXiv stat.ML TIER_1 English(EN) · Elon Litman, Gabe Guo ·

    A Theory of Generalization in Deep Learning

    arXiv:2605.01172v1 Announce Type: cross Abstract: We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal d…

  4. arXiv stat.ML TIER_1 English(EN) · Gabe Guo ·

    A Theory of Generalization in Deep Learning

    We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal dimensions corresponding to noise, the kernel's nea…