Interpretable Self-Supervised Learning via Representer Landmarks and Nystr\"om Approximation
Researchers have developed new methods for interpreting the internal workings of machine learning models. One approach trains lightweight adapters on frozen language models to enable reliable self-interpretation, improving performance on tasks like topic identification and implicit reasoning. Another method, IdEst, uses intrinsic dimension estimation to assess self-supervised learning representations, correlating strongly with downstream performance and enabling efficient hyperparameter tuning. A third paper introduces KREPES, a framework that uses representer landmarks and Nyström approximation to analytically interpret SSL representations, revealing algorithmic biases and enabling scalable analysis. AI
IMPACT These advancements in interpretability could lead to more trustworthy and understandable AI systems, facilitating debugging and bias detection.