Researchers have introduced a new theory called susceptibilities to interpret neural networks, drawing parallels to statistical mechanics. This theory defines the susceptibility of an observable to data perturbations as a derivative of posterior expectation, which is equivalent to posterior covariance via the fluctuation-dissipation theorem. Different choices of observables yield distinct results, such as the influence matrix for per-sample losses and the structural susceptibility matrix for component-localized observables, offering insights into model behavior and the geometry of loss landscapes. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Introduces a novel theoretical framework for understanding neural network behavior, potentially aiding in model interpretability and debugging.
RANK_REASON The cluster contains a single academic paper detailing a new theoretical framework for interpreting neural networks. [lever_c_demoted from research: ic=1 ai=1.0]