Researchers have theoretically and empirically demonstrated that Mahalanobis Cosine Similarity (MCS) is a strong predictor of a linear probe's Out-of-Distribution AUROC. This relationship holds across various models, layers, and concept domains. The study proves that for balanced classes with Gaussian projections, both OOD AUROC and MCS to a reference probe are linear functions of the probe's signal-to-noise ratio on test data. MCS is presented as a theoretically sound and practically effective alternative to Euclidean cosine similarity for comparing linear probes in interpretability research. AI
IMPACT Provides a theoretically grounded method for evaluating AI model interpretability, potentially improving understanding of model behavior.
RANK_REASON The cluster contains an academic paper detailing theoretical and empirical findings on a new method for analyzing linear probes in machine learning interpretability. [lever_c_demoted from research: ic=1 ai=1.0]
- arXiv
- Euclidean cosine similarity
- Gaussian function
- Hugging Face
- linear probes
- Mahalanobis Cosine Similarity
- OOD AUROC
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →