Researchers have introduced and formalized the concept of ensembling sparse autoencoders (SAEs) to improve the interpretability and utility of neural network features. This approach addresses the limitation that single SAEs capture only a subset of extractable features. By combining multiple SAEs through naive bagging and boosting, the ensemble methods are shown to reduce reconstruction error and enhance the stability of feature extraction. Empirical evaluations on language models demonstrate that SAE ensembles outperform expanded SAEs in reconstructing activations and achieve better performance on downstream tasks like concept detection and spurious correlation removal. AI
RANK_REASON The cluster contains a research paper detailing a new methodology for ensembling sparse autoencoders. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →