Transformer Feed-Forward Blocks: Linearity is Learned, Not Architectural

By PulseAugur Editorial · [1 sources] · 2026-06-19 04:00

Researchers have investigated the linearity of Transformer feed-forward networks (FFNs), finding that the degree to which an FFN block is linear is a learned property rather than an architectural one. By measuring the linear recoverability (R^2_lin) across different transformer models like GPT-2, Pythia-160m, and llama-160m, they observed significant variation between adjacent blocks. This measurement also serves as a compression signal, indicating which blocks can be safely replaced with smaller, single-layer approximations. AI

IMPACT Provides insights into the internal workings of transformer models, potentially informing future architectural designs and compression techniques.

RANK_REASON Academic paper detailing research findings on transformer architecture. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Transformer Feed-Forward Blocks: Linearity is Learned, Not Architectural

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Stuart Whipp · 2026-06-19 04:00

How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural

arXiv:2606.19379v1 Announce Type: cross Abstract: Transformer feed-forward networks (FFNs) are often treated as nonlinear stores of computation, yet how nonlinear a trained FFN block actually is has rarely been measured. We treat each FFN as a position-wise input-to-output map an…

COVERAGE [1]

How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural

RELATED ENTITIES

RELATED TOPICS