PulseAugur
实时 09:58:51

UniRefiner framework teaches ViTs to discard spurious tokens

Researchers have developed UniRefiner, a framework designed to improve the spatial accuracy of Vision Transformer (ViT) models. This method teaches pre-trained ViTs to identify and discard irrelevant or spurious tokens that can degrade performance on spatially sensitive tasks. By using contrastive registers and a dual objective, UniRefiner refines diverse ViTs with minimal fine-tuning, leading to significant improvements in tasks like semantic segmentation. AI

影响 Enhances the spatial reasoning capabilities of foundation vision models, potentially broadening their applicability in dense prediction tasks.

排序理由 The cluster contains an academic paper detailing a new method for improving existing AI models. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

UniRefiner framework teaches ViTs to discard spurious tokens

报道来源 [1]

  1. arXiv cs.CV TIER_1 English(EN) · Tong Zhang ·

    UniRefiner: Teaching Pre-trained ViTs to Self-Dispose Dross via Contrastive Register

    Representation learning with Vision Transformers (ViTs) has advanced rapidly, yet the utility of large-scale models in spatially sensitive tasks is hindered by spurious tokens. Prior efforts to mitigate this have been limited, often defining these artifacts narrowly, for example,…