ENTITY Sparse Autoencoders

Sparse Autoencoders

PulseAugur coverage of Sparse Autoencoders — every cluster mentioning Sparse Autoencoders across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

47 over 90d

Releases · 30d

0 over 90d

Papers · 30d

47 over 90d

TIER MIX · 90D

TOPICS

paper 47
model release 23
other 15
safety 13
infra 1
product 1

RELATIONSHIPS

instance of Saessolsheim 90%
used by Saessolsheim 70%
used by Gotit.pub 70%

TIMELINE

2026-05-25 research_milestone Researchers published a paper detailing a new method for multilingual language steering in LLMs using sparse autoencoders. source
2026-05-21 research_milestone Researchers published a paper detailing a new method for multilingual steering in LLMs using sparse autoencoders. source

SENTIMENT · 30D

13 day(s) with sentiment data

RECENT · PAGE 1/3 · 47 TOTAL

RESEARCH · CL_111220 · Jun 25 · 15:59

LLMs improved for forecasting via feature steering

Researchers have developed a method to improve the generalization capabilities of Large Language Models (LLMs) in forecasting tasks. By analyzing LLM internal states with sparse autoencoders, they identified features re…
RESEARCH · CL_107742 · Jun 23 · 15:39

New research explores sparse autoencoders for AI interpretability and generalization

Researchers are exploring sparse autoencoders (SAEs) for interpreting complex language and vision models. One paper introduces Qwen3-Instruct SAEs for various Qwen3 model sizes, demonstrating their use in steering model…
RESEARCH · CL_106825 · Jun 22 · 15:05

New research probes interpretability of AI location encoders · 2 sources tracked

Two new research papers explore the interpretability and spatial effect capture of location encoders used in machine learning. The first paper analyzes geographic implicit neural representations, decomposing location em…
TOOL · CL_105118 · Jun 22 · 14:59

Chemical language models' internal representations analyzed with sparse autoencoders

A new research paper explores the internal workings of chemical language models (cLMs) by applying sparse autoencoders (SAEs) to MolFormer. The study reveals that early layers of the model focus on syntactic patterns an…
RESEARCH · CL_98104 · Jun 16 · 18:28

New framework certifies interpretability of Sparse Autoencoders in language models

Researchers have developed a new framework to certify the interpretability of Sparse Autoencoders (SAEs) when used with language models. This framework establishes an upper bound on the risk of a language model by using…
RESEARCH · CL_95864 · Jun 16 · 09:22

New research tackles VLM hallucinations, distillation, and interpretability

Researchers are developing new methods to improve the capabilities and reliability of vision-language models (VLMs). One approach, DCLA, focuses on mitigating hallucinations by ensuring consistency across different laye…
TOOL · CL_93358 · Jun 16 · 04:00

New CSAE Method Unlocks Hierarchical Visual Concepts in LLMs

Researchers have developed cascaded sparse autoencoders (CSAEs) to better interpret the visual representations within multimodal large language models (MLLMs). Unlike previous methods that produced flat feature dictiona…
RESEARCH · CL_98012 · Jun 16 · 00:00

AI model interventions unreliable, new research finds

A new research paper demonstrates that interventions designed to suppress undesirable behaviors in AI models by manipulating Sparse Autoencoder (SAE) features are unreliable. The study shows that even when specific SAE …
TOOL · CL_91333 · Jun 15 · 05:06

New AI method audits protein models for hazardous designs

Researchers have developed VFUSE, a novel approach using Sparse Autoencoders (SAEs) to interpret generative protein models like RoseTTAFold3 and RFDiffusion3. This method aims to identify and understand features associa…
TOOL · CL_91442 · Jun 15 · 04:00

New method improves neural network interpretability by addressing dense activations

Researchers have proposed a new method to improve the interpretability of neural networks by questioning the assumption that all activation content can be sparsely decomposed. They hypothesize that activations contain a…
RESEARCH · CL_84409 · Jun 10 · 14:32

Sparse autoencoders show unstable features form reproducible subspaces

Researchers have investigated the reproducibility of features learned by sparse autoencoders (SAEs), a common tool for interpreting neural network representations. Their study reveals that while individual features can …
RESEARCH · CL_91462 · Jun 10 · 00:00

New research enhances sparse autoencoder interpretability and robustness

Researchers are exploring new methods to improve the interpretability and robustness of sparse autoencoders (SAEs). One approach, GRILL, aims to reveal hidden vulnerabilities in autoencoders by restoring degraded gradie…
TOOL · CL_79921 · Jun 9 · 04:00

AI concept learning unified by geometric framework

Researchers have developed a geometric framework that unifies supervised and unsupervised concept learning in AI models. This approach views both Concept Bottleneck Models (CBMs) and Sparse Autoencoders (SAEs) as learni…
TOOL · CL_77263 · Jun 8 · 04:00

New ViSAE toolbox interprets and steers Vision Transformer models

Researchers have developed ViSAE, a new toolbox designed to interpret and steer the behavior of Vision Transformers (ViTs). Inspired by neuroscience, ViSAE uses sparse autoencoders to decompose ViT representations into …
RESEARCH · CL_79165 · Jun 7 · 07:54

New framework enhances LLM interpretability with self-correcting explanations

Researchers have introduced SAEExplainer, a new framework designed to improve the interpretability of Sparse Autoencoders (SAEs) within large language models. This method uses activation scores as a reward signal to ena…
TOOL · CL_68436 · Jun 3 · 04:00

New metric measures LLM ideological depth and refusal causes

Researchers have introduced a new metric called "ideological depth" to measure the internal political representations within large language models. This metric assesses a model's ability to follow political instructions…
RESEARCH · CL_68434 · Jun 3 · 04:00

LLM research probes in-context learning mechanisms

Two new research papers explore the mechanisms behind in-context learning in large language models. One paper investigates whether transformer activations can be used to optimize in-context sample selection, finding tha…
TOOL · CL_65820 · Jun 2 · 04:00

Sparse autoencoders enable interpretable emotion control in TTS

Researchers have developed a new method for controlling emotions in text-to-speech (TTS) systems by utilizing sparse autoencoders (SAEs) to identify and manipulate latent features within large language models. This appr…
RESEARCH · CL_66057 · Jun 1 · 15:34

New theory explains how Sparse Autoencoders structure interpretable representations

A new research paper explores the theoretical underpinnings of Sparse Autoencoders (SAEs), a technique used to interpret complex neural network representations. The study proposes a framework to understand what SAEs ext…
RESEARCH · CL_65976 · Jun 1 · 10:50

Research questions stability of Archetypal SAEs for concept extraction

A new research paper challenges the stability claims of Archetypal Sparse Autoencoders (SAEs), a method designed for more reliable concept extraction in neural networks. The study demonstrates that the reported stabilit…

LLMs improved for forecasting via feature steering

New research explores sparse autoencoders for AI interpretability and generalization

New research probes interpretability of AI location encoders · 2 sources tracked

Chemical language models' internal representations analyzed with sparse autoencoders

New framework certifies interpretability of Sparse Autoencoders in language models

New research tackles VLM hallucinations, distillation, and interpretability

New CSAE Method Unlocks Hierarchical Visual Concepts in LLMs

AI model interventions unreliable, new research finds

New AI method audits protein models for hazardous designs

New method improves neural network interpretability by addressing dense activations

Sparse autoencoders show unstable features form reproducible subspaces

New research enhances sparse autoencoder interpretability and robustness

AI concept learning unified by geometric framework

New ViSAE toolbox interprets and steers Vision Transformer models

New framework enhances LLM interpretability with self-correcting explanations

New metric measures LLM ideological depth and refusal causes

LLM research probes in-context learning mechanisms

Sparse autoencoders enable interpretable emotion control in TTS

New theory explains how Sparse Autoencoders structure interpretable representations

Research questions stability of Archetypal SAEs for concept extraction