Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 23h · [2 sources]

Where does Absolute Position come from in decoder-only Transformers?

Researchers have identified two key architectural components in decoder-only Transformers that contribute to the model's ability to distinguish absolute position, despite positional encoding methods like RoPE primarily encoding relative offsets. These components are the causal mask, whose softmax denominator is inherently dependent on query position, and the residual stream, which acts as a dynamical system at position 0. The study analyzes how different architectural choices, such as NTK scaling and sliding-window attention, interact with these components to influence the model's positional awareness. AI

IMPACT Reveals how architectural choices enable absolute position understanding in LLMs, potentially guiding future model design.

RoPE
decoder-only Transformers
NTK scaling
sliding-window attention