Sparse autoencoders show unstable features form reproducible subspaces

By PulseAugur Editorial · [2 sources] · 2026-06-10 14:32

Researchers have investigated the reproducibility of features learned by sparse autoencoders (SAEs), a common tool for interpreting neural network representations. Their study reveals that while individual features can be unstable across different training runs, they often aggregate into reproducible lower-rank subspaces. Stable features are found to carry the majority of the signal relevant for reconstruction and prediction, whereas unstable features have minimal impact and are linked to surface-level triggers. AI

IMPACT Clarifies how to interpret learned features in neural networks, potentially improving model interpretability and debugging.

RANK_REASON This is a research paper detailing findings on the behavior of sparse autoencoders.

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

arXiv cs.AI TIER_1 English(EN) · Gleb Gerasimov, Timofei Rusalev, Nikita Balagansky, Daniil Laptev, Vadim Kurochkin, Daniil Gavrilov · 2026-06-11 04:00

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

arXiv:2606.12138v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are widely used to interpret neural network representations, but their utility depends on whether the learned features are reproducible across training runs. We study this question through \emph{feature …
arXiv cs.AI TIER_1 English(EN) · Daniil Gavrilov · 2026-06-10 14:32

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

Sparse autoencoders (SAEs) are widely used to interpret neural network representations, but their utility depends on whether the learned features are reproducible across training runs. We study this question through \emph{feature stability}: for each SAE feature, we estimate the …

COVERAGE [2]

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

RELATED ENTITIES

RELATED TOPICS