Exploding and vanishing gradients in deep neural networks: the effect of residual connections
A new research paper analyzes the phenomenon of exploding and vanishing gradients in deep neural networks, focusing on the impact of residual connections. The study utilizes multiplicative ergodic theory and a characterization of Liapunov exponents by Furstenberg and Kifer to provide a precise statement on the Liapunov spectrum and how residual connections affect it. AI
IMPACT Provides theoretical insights into deep neural network training dynamics, potentially informing future model architectures.