Brief · PulseAugur

RESEARCH · arXiv stat.ML English(EN) · 3d · [2 sources]

Phase Transitions in Attention: A Bayesian Theory of Copy Head Emergence

Researchers have developed a Bayesian theory to explain the emergence of "copy heads" in transformer attention mechanisms. Their analysis of a single-layer softmax attention network reveals a phase transition in how these attention patterns form, dependent on the amount of training data. This theoretical framework provides a first-principles explanation for the abrupt appearance of specific subcircuits, similar to observations in large language model training. AI

IMPACT Provides a theoretical explanation for emergent behaviors in LLMs, potentially guiding future model design and training.

Linear attention
Transformers
Large language models
Attention
Softmax attention
Copy heads
Bayesian theory
Adam