Testing the Test: Score-Direction Instability in Class-Split Anomaly Detection
A new research paper highlights a critical flaw in how anomaly detection models are evaluated. The study reveals that standard within-dataset class-split evaluation can be unreliable when the anomaly class overlaps with the normal data distribution in representation space. This overlap can cause anomaly scores to become unstable, even inverting, and the preferred score direction may change depending on the unknown anomaly class. The researchers propose a simple diagnostic tool called neighborhood class leakage to predict this instability, suggesting that current benchmarks should be viewed as geometry-dependent stress tests rather than definitive measures of anomaly detection capability. AI
IMPACT Highlights potential unreliability in current anomaly detection benchmarks, urging a re-evaluation of model performance claims.