A new research paper published on arXiv explores the variability in circuit discovery methods for Large Language Models (LLMs). The study identifies three main sources of variance: resampling, rephrasing, and sample-wise variance. The authors introduce CEAP, a new method that improves upon the existing EAP-IG technique by reducing resampling variance. They also suggest that rephrasing variance indicates LLMs may be inherently difficult to steer due to the diverse ways prompts can activate different internal circuits. Sample-wise variance, they argue, is largely benign and related to the definition of unfaithfulness rather than circuit defects. AI
IMPACT Introduces a new method to improve LLM interpretability and control, potentially aiding in understanding and steering model behavior.
RANK_REASON The cluster contains a research paper detailing a new method for circuit discovery in LLMs, published on arXiv.
- arXiv
- EAP-IG
- Hugging Face
- alphaXiv
- CatalyzeX
- Connected Papers
- DagsHub
- Gotit.pub
- IArxiv
- Litmaps
- LLMs
- ScienceCast
- scite Smart Citations
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →