Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers
Researchers have developed a novel three-step method called Spectral Probe-Circuits to identify specific computational circuits within pretrained transformer models. This technique uses a spectral signal to rank attention heads based on their sustained, content-dependent computation without requiring labels or attribution gradients. The method has been validated across various model sizes and architectures, successfully identifying essential circuits like the induction circuit, which, when ablated, caused a significant drop in performance on synthetic induction tasks. AI
IMPACT Provides a new methodology for understanding internal model computations, potentially aiding in interpretability and debugging.