ViT depth computation approximated by linear dynamics

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have explored the internal computations of Vision Transformers (ViTs) by applying Dynamic Mode Decomposition (DMD). Their findings suggest that contiguous blocks within a ViT can be approximated by a single linear operator applied repeatedly. This linear operator accurately predicts intermediate activations over short spans, particularly in earlier layers and for the 'cls' token, but this local fidelity does not translate to improved performance on downstream tasks. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Reveals that ViT computations exhibit linear dynamics, suggesting potential for model compression and efficiency gains.

RANK_REASON The cluster contains an academic paper detailing a new analytical method for understanding existing models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

COVERAGE [1]

arXiv cs.CV TIER_1 · Saif Eddin Jabari · 2026-05-08 10:33

Dynamic Mode Decomposition along Depth in Vision Transformers

Recent work has shown that contiguous vision transformer (ViT) blocks (a) can be replaced by a linear map and (b) organize into recurrent phases of computation. We ask whether these observations coincide: does ViT depth implement approximately \textit{autonomous linear} dynamics,…

COVERAGE [1]

Dynamic Mode Decomposition along Depth in Vision Transformers

RELATED ENTITIES

RELATED TOPICS