PulseAugur
实时 06:40:11
English(EN) C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders

新研究解决稀疏自编码器中的可解释性挑战 · 2篇论文

两篇新研究论文解决了使用稀疏自编码器(SAE)解释大型语言模型时面临的挑战。第一篇论文介绍了C$^2$R(跨样本一致性正则化),以缓解特征分裂和吸收问题,这些问题源于跨样本的不一致的潜在分配。第二篇论文识别并解决了视觉-语言模型中的跨模态特征异质性问题,在这种情况下,相同概念根据其在图像或文本嵌入中的表示,可能会激活不同的潜在方向。 AI

影响 这些论文提供了改进AI模型可解释性和可靠性的新技术,有望更好地理解和控制其内部工作机制。

排序理由 两篇在arXiv上发表的学术论文,介绍了用于解释AI模型的新方法。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

新研究解决稀疏自编码器中的可解释性挑战 · 2篇论文

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · Haoran Jin, Xiting Wang, Shijie Ren, Hong Xie, Defu Lian ·

    C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders

    arXiv:2606.30609v1 Announce Type: cross Abstract: Sparse Autoencoders (SAEs) are widely used to interpret large language models by decomposing activations into sparse, human-understandable features, but scaling to large dictionaries exposes fundamental challenges. Systematic stud…

  2. arXiv cs.LG TIER_1 English(EN) · Chungpa Lee, Jihoon Kwon, Kyle Min, Jy-yong Sohn ·

    Same Concept, Different Directions: Cross-Modal Feature Heterogeneity in Sparse Autoencoders

    arXiv:2606.29888v1 Announce Type: new Abstract: Vision-language models map images and text into a joint embedding space. However, these embeddings often entangle multiple semantic features, which limits their interpretability and controllability. While sparse autoencoders have em…

  3. arXiv cs.AI TIER_1 English(EN) · Defu Lian ·

    C$^{2}$R: Cross-sample Consistency Regularization Mitigates Feature Splitting and Absorption in Sparse Autoencoders

    Sparse Autoencoders (SAEs) are widely used to interpret large language models by decomposing activations into sparse, human-understandable features, but scaling to large dictionaries exposes fundamental challenges. Systematic studies reveal pervasive feature splitting that fragme…