Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 19h · [2 sources]

Demystifying Variance in Circuit Discovery of LLMs

A new research paper published on arXiv explores the variability in circuit discovery methods for Large Language Models (LLMs). The study identifies three main sources of variance: resampling, rephrasing, and sample-wise variance. The authors introduce CEAP, a new method that improves upon the existing EAP-IG technique by reducing resampling variance. They also suggest that rephrasing variance indicates LLMs may be inherently difficult to steer due to the diverse ways prompts can activate different internal circuits. Sample-wise variance, they argue, is largely benign and related to the definition of unfaithfulness rather than circuit defects. AI

IMPACT Introduces a new method to improve LLM interpretability and control, potentially aiding in understanding and steering model behavior.

Hugging Face
arXiv
EAP-IG
alphaXiv
ScienceCast
CatalyzeX
Connected Papers
Litmaps
scite Smart Citations
Gotit.pub
LLMs
IArxiv
DagsHub