From Circuit Evidence to Mechanistic Theory: An Inductive Logic Approach
Researchers have developed a formal framework to advance mechanistic interpretability in neural networks. This approach treats circuit interpretation as inductive theory construction, creating a shared representation for discovered circuits. The system uses Causal Functional Signatures (CFS) and inductive logic programming (ILP) to characterize circuits, enabling explicit comparison and transferability across different model scales and architectures. AI
IMPACT Provides a formal infrastructure for cumulative mechanistic science, enabling more systematic and comparable interpretability research.