Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 11h

Comparing Linear Probes with Mahalanobis Cosine Similarity

Researchers have theoretically and empirically demonstrated that Mahalanobis Cosine Similarity (MCS) is a strong predictor of a linear probe's Out-of-Distribution AUROC. This relationship holds across various models, layers, and concept domains. The study proves that for balanced classes with Gaussian projections, both OOD AUROC and MCS to a reference probe are linear functions of the probe's signal-to-noise ratio on test data. MCS is presented as a theoretically sound and practically effective alternative to Euclidean cosine similarity for comparing linear probes in interpretability research. AI

IMPACT Provides a theoretically grounded method for evaluating AI model interpretability, potentially improving understanding of model behavior.

Hugging Face
arXiv
Gaussian function
linear probes
Mahalanobis Cosine Similarity
OOD AUROC
Euclidean cosine similarity