What is Missing? Explaining Neurons Activated by Absent Concepts
Researchers have introduced new methods to improve explainable AI (XAI) by identifying when a neuron's activation signifies the absence of a concept, rather than its presence. Current XAI techniques often struggle to detect these 'encoded absences,' which are common in deep neural networks. The proposed extensions to attribution and feature visualization methods can reveal these absent concepts, leading to better model debiasing and understanding, as demonstrated in experiments with ImageNet models. AI
IMPACT Enhances interpretability of AI models by revealing hidden negative correlations, potentially improving safety and debiasing.