Dropout Universality: Scaling Laws and Optimal Scheduling at the Edge-of-Chaos
Researchers have developed a mean-field theory to understand dropout in neural networks, viewing it as a perturbation of critical signal propagation. The theory establishes distinct universality classes for smooth and ReLU-like activation functions, detailing their differing critical exponents and scaling behaviors. This framework also suggests optimal dropout scheduling strategies that can reduce test loss and improve accuracy without increasing computational cost, with predictions tested on MLPs and Vision Transformers. AI
IMPACT Provides a theoretical framework to optimize dropout scheduling, potentially improving model performance and efficiency.