English(EN) Ablating Archetypes: The Stability of Archetypal SAEs is an Artifact of Initialization and Metric Design

研究质疑用于概念提取的原型 SAEs 的稳定性

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-01 10:50

一篇新的研究论文对稀疏自编码器（SAEs）的原型方法（一种用于在神经网络中进行更可靠概念提取的方法）的稳定性声明提出了质疑。研究表明，报告的稳定性是跨运行的相同初始化造成的产物，而不是原型约束固有的属性。当移除这种确定性初始化时，原型方法显示出没有显著的稳定优势。该论文还强调了度量设计中存在的可能使终点稳定性解释复杂化的问题。 AI

影响质疑一种特定可解释性技术的可靠性，可能影响研究人员分析神经网络特征的方式。

排序理由该集群包含一篇发表在 arXiv 上的研究论文，讨论了机器学习中的一种特定方法论。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Micha{\l} Brzozowski, Neo Christopher Chung · 2026-06-02 04:00

Ablating Archetypes: The Stability of Archetypal SAEs is an Artifact of Initialization and Metric Design

arXiv:2606.02061v1 Announce Type: new Abstract: Dictionary learning with sparse autoencoders (SAEs) produces overcomplete bases from neural network activations that are often interpretable and reduces polysemanticity. However, features from SAEs vary substantially across random s…
arXiv cs.LG TIER_1 English(EN) · Neo Christopher Chung · 2026-06-01 10:50

Ablating Archetypes: The Stability of Archetypal SAEs is an Artifact of Initialization and Metric Design

Dictionary learning with sparse autoencoders (SAEs) produces overcomplete bases from neural network activations that are often interpretable and reduces polysemanticity. However, features from SAEs vary substantially across random seeds -- a problem known as instability. Archetyp…

报道来源 [2]

Ablating Archetypes: The Stability of Archetypal SAEs is an Artifact of Initialization and Metric Design

Ablating Archetypes: The Stability of Archetypal SAEs is an Artifact of Initialization and Metric Design

相关实体

相关话题