English(EN) Expander Sparse Autoencoders: Parameter-Efficient Dictionaries for Mechanistic Interpretability

Expander SAEs 为神经网络可解释性提供参数高效的字典

作者 PulseAugur 编辑部 · [1 个来源] · 2026-07-03 04:00

研究人员引入了 Expander Sparse Autoencoders (SAEs)，一种使用参数高效字典来解释神经网络激活的新方法。与传统的 SAE 相比，该方法显著减少了学习到的解码器值数量，使其更易于扩展到大型模型。在 Pythia、Qwen2.5-3B 和 Llama 3.2 1B 等模型上的实验表明，Expander SAEs 在存储-保真度权衡方面具有竞争力，使用的参数明显更少，同时保留了高百分比的恢复 CE 损失。 AI

影响这项研究可能带来更有效的方法来理解和调试大型神经网络。

排序理由该集群描述了一篇发表在 arXiv 上的新研究论文，详细介绍了一种用于神经网络机制可解释性的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Rodrigo Mendoza-Smith · 2026-07-03 04:00

Expander Sparse Autoencoders: Parameter-Efficient Dictionaries for Mechanistic Interpretability

arXiv:2607.01799v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) decompose internal activations of neural networks into sparse linear combinations of learned features by fitting an overcomplete dictionary $\mathbf{W}\in\mathbb{R}^{m\times n}$ with $m<n$, and inferring…

报道来源 [1]

Expander Sparse Autoencoders: Parameter-Efficient Dictionaries for Mechanistic Interpretability

相关实体

相关话题