New theories explore how pre-training and sparse connectivity enhance deep learning generalization

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 4 sources

Three new papers explore the theoretical underpinnings of generalization in deep learning. One paper identifies pre-training as a critical factor for weak-to-strong generalization, demonstrating its emergence through a phase transition during pre-training. Another investigates how sparse connectivity in convolutional networks can improve generalization by processing inputs in low-dimensional patches, offering a principled explanation for their advantage. The third paper presents a non-asymptotic theory explaining generalization by showing how the neural tangent kernel partitions output space, managing signal and noise, and introduces a practical objective that improves training efficiency and performance. AI

Summary written by gemini-2.5-flash-lite from 4 sources. How we write summaries →

IMPACT These theoretical advancements offer new frameworks for understanding and improving model generalization, potentially leading to more robust and efficient AI systems.

RANK_REASON The cluster consists of multiple academic papers published on arXiv, focusing on theoretical aspects of deep learning generalization.

Read on arXiv stat.ML →

paper
other

COVERAGE [4]

arXiv cs.LG TIER_1 · Wei Yao, Wang Zhaoyang, Gengze Xu, Chen Qian, Dongrui Liu, Ziqiao Wang, Yong Liu, Yunbei Xu · 2026-05-08 04:00

On the Blessing of Pre-training in Weak-to-Strong Generalization

arXiv:2605.05710v1 Announce Type: new Abstract: The paradigm of Weak-to-Strong Generalization (W2SG) suggests that a pre-trained strong model can surpass its weak supervisor, yet the decisive role of pre-training remains theoretically and empirically under-explored. In this work,…
arXiv cs.LG TIER_1 · Tongtong Liang, Esha Singh, Rahul Parhi, Alexander Cloninger, Yu-Xiang Wang · 2026-05-08 04:00

Does Sparse Connectivity Improve Generalization? Convolutional Networks Below the Edge of Stability

arXiv:2603.04807v2 Announce Type: replace-cross Abstract: Gradient descent on overparameterized neural networks typically operates at the Edge of Stability (EoS), where the largest Hessian eigenvalue hovers around a step-size-dependent threshold. We study how sparse connectivity …
arXiv stat.ML TIER_1 · Elon Litman, Gabe Guo · 2026-05-05 04:00

A Theory of Generalization in Deep Learning

arXiv:2605.01172v1 Announce Type: cross Abstract: We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal d…
arXiv stat.ML TIER_1 · Gabe Guo · 2026-05-02 00:21

A Theory of Generalization in Deep Learning

We present a non-asymptotic theory of generalization in deep learning where the empirical neural tangent kernel partitions the output space. In directions corresponding to signal, error dissipates rapidly; in the vast orthogonal dimensions corresponding to noise, the kernel's nea…

COVERAGE [4]

On the Blessing of Pre-training in Weak-to-Strong Generalization

Does Sparse Connectivity Improve Generalization? Convolutional Networks Below the Edge of Stability

A Theory of Generalization in Deep Learning

A Theory of Generalization in Deep Learning

RELATED ENTITIES

RELATED TOPICS