Researchers have explored the internal computations of Vision Transformers (ViTs) by applying Dynamic Mode Decomposition (DMD). Their findings suggest that contiguous blocks within a ViT can be approximated by a single linear operator applied repeatedly. This linear operator accurately predicts intermediate activations over short spans, particularly in earlier layers and for the 'cls' token, but this local fidelity does not translate to improved performance on downstream tasks. AI
影响 Reveals that ViT computations exhibit linear dynamics, suggesting potential for model compression and efficiency gains.
排序理由 The cluster contains an academic paper detailing a new analytical method for understanding existing models. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →