Actionable Interpretability Must Be Defined in Terms of Symmetries
A new paper proposes that the concept of interpretability in AI should be redefined using the framework of symmetries. The authors argue that current definitions are inadequate for formal testing or design. They introduce four specific symmetries—inference equivariance, information invariance, concept-closure invariance, and structural invariance—which they believe can formalize interpretable models as a subset of probabilistic models. This approach aims to unify interpretable inference methods and provide a formal system for verifying compliance with safety and regulatory standards. AI
IMPACT Proposes a new formal framework for AI interpretability, potentially enabling more rigorous safety and regulatory compliance.