Researchers at EleutherAI have found that Sparse Autoencoders (SAEs) trained on identical data and configurations do not consistently learn the same internal features. When training two SAEs with different random initializations, only about 53% of their learned features were shared, indicating significant variability. This overlap decreased as the size of the SAEs increased, suggesting that larger models may develop more arbitrary or diverse internal representations. The study used the Hungarian algorithm to match latents and measure similarity, revealing that feature splitting and absorption might contribute to these disjoint representations. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The cluster is based on an academic paper detailing research findings on Sparse Autoencoders.