New theory links Mahalanobis Cosine Similarity to probe performance

By PulseAugur Editorial · [1 sources] · 2026-06-19 04:00

Researchers have theoretically and empirically demonstrated that Mahalanobis Cosine Similarity (MCS) is a strong predictor of a linear probe's Out-of-Distribution AUROC. This relationship holds across various models, layers, and concept domains. The study proves that for balanced classes with Gaussian projections, both OOD AUROC and MCS to a reference probe are linear functions of the probe's signal-to-noise ratio on test data. MCS is presented as a theoretically sound and practically effective alternative to Euclidean cosine similarity for comparing linear probes in interpretability research. AI

IMPACT Provides a theoretically grounded method for evaluating AI model interpretability, potentially improving understanding of model behavior.

RANK_REASON The cluster contains an academic paper detailing theoretical and empirical findings on a new method for analyzing linear probes in machine learning interpretability. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New theory links Mahalanobis Cosine Similarity to probe performance

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Zhuofan Josh Ying, Peter Hase, Nikolaus Kriegeskorte · 2026-06-19 04:00

Comparing Linear Probes with Mahalanobis Cosine Similarity

arXiv:2606.19603v1 Announce Type: new Abstract: Linear probes are widely used in interpretability research and often compared by cosine similarity. The Mahalanobis cosine similarity (MCS) between two directions, which reweights the inner product by test data covariance, is a natu…

COVERAGE [1]

Comparing Linear Probes with Mahalanobis Cosine Similarity

RELATED ENTITIES

RELATED TOPICS