Brief · PulseAugur

RESEARCH · arXiv cs.LG Nederlands(NL) · 1d · [4 sources]

Ensembling Sparse Autoencoders

Researchers have introduced novel approaches to enhance Sparse Autoencoders (SAEs), a tool for interpreting neural network activations. One method, the Rational Sparse Autoencoder (RSAE), replaces fixed activation functions with trainable rational functions, improving reconstruction and downstream behavior metrics. Another development proposes cosine scoring for SAEs, which better aligns learned features with recognizable concepts by focusing on directional alignment rather than raw activation magnitude, especially for normalized representations. Additionally, a technique for ensembling SAEs has been formalized, demonstrating improved reconstruction accuracy and stability compared to single SAEs or expanded versions. AI

IMPACT These advancements in Sparse Autoencoders could lead to more interpretable AI models, improving debugging and understanding of complex neural networks.

Soham Gadgil
Sparse Autoencoders
DagsHub
ConvNeXt
alphaXiv
CORE Recommender
ScienceCast
Gotit.pub
cosine similarity
CatalyzeX Code Finder for Papers
IArxiv Recommender
Rational Sparse Autoencoder
JumpReLU
PBK
Remez exchange
Hugging Face
FGVC-Aircraft dataset
ReLU