PulseAugur
LIVE 08:24:25
research · [4 sources] ·
0
research

New theories explore how pre-training and sparse connectivity enhance deep learning generalization

Three new papers explore the theoretical underpinnings of generalization in deep learning. One paper identifies pre-training as a critical factor for weak-to-strong generalization, demonstrating its emergence through a phase transition during pre-training. Another investigates how sparse connectivity in convolutional networks can improve generalization by processing inputs in low-dimensional patches, offering a principled explanation for their advantage. The third paper presents a non-asymptotic theory explaining generalization by showing how the neural tangent kernel partitions output space, managing signal and noise, and introduces a practical objective that improves training efficiency and performance. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT These theoretical advancements offer new frameworks for understanding and improving model generalization, potentially leading to more robust and efficient AI systems.

RANK_REASON The cluster consists of multiple academic papers published on arXiv, focusing on theoretical aspects of deep learning generalization.

Read on arXiv stat.ML →

COVERAGE [4]

  1. arXiv cs.LG TIER_1 · Wei Yao, Wang Zhaoyang, Gengze Xu, Chen Qian, Dongrui Liu, Ziqiao Wang, Yong Liu, Yunbei Xu ·

    On the Blessing of Pre-training in Weak-to-Strong Generalization

    arXiv:2605.05710v1 Announce Type: new Abstract: The paradigm of Weak-to-Strong Generalization (W2SG) suggests that a pre-trained strong model can surpass its weak supervisor, yet the decisive role of pre-training remains theoretically and empirically under-explored. In this work,…

  2. arXiv cs.LG TIER_1 · Tongtong Liang, Esha Singh, Rahul Parhi, Alexander Cloninger, Yu-Xiang Wang ·

    Does Sparse Connectivity Improve Generalization? Convolutional Networks Below the Edge of Stability

    arXiv:2603.04807v2 Announce Type: replace-cross Abstract: Gradient descent on overparameterized neural networks typically operates at the Edge of Stability (EoS), where the largest Hessian eigenvalue hovers around a step-size-dependent threshold. We study how sparse connectivity …

  3. arXiv stat.ML TIER_1 · Elon Litman, Gabe Guo ·

    A Theory of Generalization in Deep Learning

    arXiv:2605.01172v1 Announce Type: cross Abstract: We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal d…

  4. arXiv stat.ML TIER_1 · Gabe Guo ·

    A Theory of Generalization in Deep Learning

    We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal dimensions corresponding to noise, the kernel's nea…