Researchers have analyzed signal propagation in linear recurrent models with finite width, finding that the accuracy of infinite-width approximations degrades as recurrent depth increases relative to model width. They identified three regimes: subcritical ($t=o(\sqrt n)$) where the approximation holds, critical ($t\sim c\sqrt n$) where deviations emerge, and supercritical ($t\gg \sqrt n$) where finite-width effects dominate. This work pinpoints when standard initialization schemes like Glorot become unstable and highlights that finite-width effects accumulate faster in recurrent models than feedforward ones. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Identifies the precise depth-width scaling at which infinite-width theory breaks down in recurrent models, impacting initialization stability.
RANK_REASON This is a research paper published on arXiv detailing theoretical findings on signal propagation in linear recurrent models.