Mechanistic Interpretability of EEG Foundation Models via Sparse Autoencoders
Researchers have developed a method using Sparse Autoencoders to interpret the internal workings of EEG foundation models, which are currently opaque despite their clinical success. This framework allows for the grounding of extracted features in clinical data, enabling the benchmarking of model representations and the identification of critical failures like concept entanglement and "wrecking-ball" interventions. The approach translates latent manipulations into physiologically interpretable frequency signatures, offering a path towards greater clinical trust and understanding of these AI systems. AI
IMPACT Provides a framework for understanding and improving the reliability of AI models used in clinical settings.