Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 11h

Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

Researchers have developed a novel three-step method called Spectral Probe-Circuits to identify specific computational circuits within pretrained transformer models. This technique uses a spectral signal to rank attention heads based on their sustained, content-dependent computation without requiring labels or attribution gradients. The method has been validated across various model sizes and architectures, successfully identifying essential circuits like the induction circuit, which, when ablated, caused a significant drop in performance on synthetic induction tasks. AI

IMPACT Provides a new methodology for understanding internal model computations, potentially aiding in interpretability and debugging.

Transformers
Pythia
Spectral Probe-Circuits