Researchers have found that specific training techniques can encourage spatial locality in Vision Transformers. By using a 'Modern' protocol involving data augmentation like CutMix and ColorJitter, along with label smoothing, early layers of ViTs showed more concentrated attention patterns. An ablation study revealed that CutMix was the primary driver of this effect, significantly reducing the Mean Attention Distance compared to baseline methods. AI
影响 Training protocols like CutMix can improve the efficiency and interpretability of Vision Transformers by promoting localized attention.
排序理由 The cluster contains an academic paper detailing a new finding in machine learning model training. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →