Researchers have investigated whether a skip connection around a single-hidden-layer MLP can be absorbed into a residual-free MLP of the same width. They found that for certain activation functions like ReLU^2 and ReGLU, absorption is impossible due to degree arguments. For gated activations such as SwiGLU and GeGLU, a linearization argument also leads to the same conclusion. While absorption is possible for ungated ReLU and GELU under specific, non-generic weight conditions, skip-connected and residual-free MLPs generally represent distinct function classes. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Explores theoretical limitations of MLP architectures, potentially influencing future model design.
RANK_REASON This is a research paper published on arXiv discussing theoretical properties of MLPs.