PLOT: Progressive Localization via Optimal Transport in Neural Causal Abstraction
Researchers have developed PLOT, a new framework for mechanistic interpretability in neural networks. PLOT uses optimal transport to efficiently localize causal variables within a neural network's computation. This method speeds up existing techniques like Distributed Alignment Search (DAS) by providing a more targeted approach to identifying relevant neural sites, making causal abstraction research more scalable and accurate. AI
IMPACT Enables more efficient and scalable research into understanding how neural networks function internally.