MLP skip connections can't be absorbed into residual-free models

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have investigated whether a skip connection around a single-hidden-layer MLP can be absorbed into a residual-free MLP of the same width. They found that for certain activation functions like ReLU^2 and ReGLU, absorption is impossible due to degree arguments. For gated activations such as SwiGLU and GeGLU, a linearization argument also leads to the same conclusion. While absorption is possible for ungated ReLU and GELU under specific, non-generic weight conditions, skip-connected and residual-free MLPs generally represent distinct function classes. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Explores theoretical limitations of MLP architectures, potentially influencing future model design.

RANK_REASON This is a research paper published on arXiv discussing theoretical properties of MLPs.

Read on arXiv cs.LG →

paper
other

COVERAGE [1]

arXiv cs.LG TIER_1 · Antonij Mijoski, Marko Karbevski · 2026-04-28 04:00

Can an MLP Absorb Its Own Skip Connection?

arXiv:2604.23705v1 Announce Type: new Abstract: We study when a skip connection around a single-hidden-layer MLP can be absorbed into a residual-free MLP of the same width. We first show that for any architecture whose skip branch is an invertible linear map (including Hyper-Conn…

COVERAGE [1]

Can an MLP Absorb Its Own Skip Connection?

RELATED ENTITIES

RELATED TOPICS