This paper theoretically investigates how data geometry influences generalization in overparameterized neural networks trained below the edge of stability. It derives generalization bounds for two-layer ReLU networks, showing that these bounds adapt to the intrinsic dimension of data distributions. The research indicates that data distributions that are harder to shatter with ReLU activation thresholds lead to better generalization, while data concentrated on a sphere favors memorization. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides theoretical insights into neural network generalization, potentially guiding future model architectures and training strategies.
RANK_REASON This is a theoretical research paper published on arXiv concerning neural network generalization. [lever_c_demoted from research: ic=1 ai=1.0]