PulseAugur
LIVE 06:23:45
research · [1 source] ·
0
research

MLP skip connections can't be absorbed into residual-free models

Researchers have investigated whether a skip connection around a single-hidden-layer MLP can be absorbed into a residual-free MLP of the same width. They found that for certain activation functions like ReLU^2 and ReGLU, absorption is impossible due to degree arguments. For gated activations such as SwiGLU and GeGLU, a linearization argument also leads to the same conclusion. While absorption is possible for ungated ReLU and GELU under specific, non-generic weight conditions, skip-connected and residual-free MLPs generally represent distinct function classes. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Explores theoretical limitations of MLP architectures, potentially influencing future model design.

RANK_REASON This is a research paper published on arXiv discussing theoretical properties of MLPs.

Read on arXiv cs.LG →

COVERAGE [1]

  1. arXiv cs.LG TIER_1 · Antonij Mijoski, Marko Karbevski ·

    Can an MLP Absorb Its Own Skip Connection?

    arXiv:2604.23705v1 Announce Type: new Abstract: We study when a skip connection around a single-hidden-layer MLP can be absorbed into a residual-free MLP of the same width. We first show that for any architecture whose skip branch is an invertible linear map (including Hyper-Conn…