Researchers have investigated whether a skip connection around a single-hidden-layer MLP can be absorbed into a residual-free MLP of the same width. They found that for certain activation functions like ReLU^2 and ReGLU, absorption is impossible due to degree arguments. For gated activations such as SwiGLU and GeGLU, a linearization argument also leads to the same conclusion. While absorption is possible for ungated ReLU and GELU under specific, non-generic weight conditions, skip-connected and residual-free MLPs generally represent distinct function classes. AI
影响 Explores theoretical limitations of MLP architectures, potentially influencing future model design.
排序理由 This is a research paper published on arXiv discussing theoretical properties of MLPs.
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →