Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 11h

How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural

Researchers have investigated the linearity of Transformer feed-forward networks (FFNs), finding that the degree to which an FFN block is linear is a learned property rather than an architectural one. By measuring the linear recoverability (R^2_lin) across different transformer models like GPT-2, Pythia-160m, and llama-160m, they observed significant variation between adjacent blocks. This measurement also serves as a compression signal, indicating which blocks can be safely replaced with smaller, single-layer approximations. AI

IMPACT Provides insights into the internal workings of transformer models, potentially informing future architectural designs and compression techniques.

Hugging Face
GPT-2
arXiv
transformer
Gelu
Pythia-160M
feed forward network
llama-160m