English(EN) Understanding Geometric Representations in Self-Supervised Vision Transformers via Subspace Intervention

新框架揭示视觉 Transformer 如何编码几何信息

作者 PulseAugur 编辑部 · [2 个来源] · 2026-07-02 10:18

研究人员开发了一个新框架，用于分析自监督视觉 Transformer (ViTs) 如何编码几何信息。通过使用奇异值分解 (SVD) 来检查线性探针的权重，他们发现预训练目标显著影响特征编码。具体来说，DINOv2 对齐空间特征以便于提取，而掩码自编码器 (MAE) 则分散这些信号，需要更广泛的上下文。研究还表明，几何表示具有高度可压缩性，并且几何精度在中间层达到峰值，然后转移到语义抽象。 AI

影响为视觉 Transformer 的特征选择和解码器设计提供了见解。

排序理由学术论文，详细介绍了一种分析 AI 模型表示的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Weichen Zhou, Yawen Zou, Chunzhi Gu, Ran Dong, Haoran Xie, Chao Zhang · 2026-07-03 04:00

Understanding Geometric Representations in Self-Supervised Vision Transformers via Subspace Intervention

arXiv:2607.01987v1 Announce Type: new Abstract: We introduce a controlled subspace intervention framework to investigate how self-supervised Vision Transformers (ViTs) encode dense geometric information. While linear probing is widely used to assess geometric representations, it …
arXiv cs.CV TIER_1 English(EN) · Chao Zhang · 2026-07-02 10:18

Understanding Geometric Representations in Self-Supervised Vision Transformers via Subspace Intervention

We introduce a controlled subspace intervention framework to investigate how self-supervised Vision Transformers (ViTs) encode dense geometric information. While linear probing is widely used to assess geometric representations, it treats features as a black box, failing to disen…

报道来源 [2]

Understanding Geometric Representations in Self-Supervised Vision Transformers via Subspace Intervention

Understanding Geometric Representations in Self-Supervised Vision Transformers via Subspace Intervention

相关实体

相关话题