Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 1w

UniRefiner: Teaching Pre-trained ViTs to Self-Dispose Dross via Contrastive Register

Researchers have developed UniRefiner, a framework designed to improve the spatial accuracy of Vision Transformer (ViT) models. This method teaches pre-trained ViTs to identify and discard irrelevant or spurious tokens that can degrade performance on spatially sensitive tasks. By using contrastive registers and a dual objective, UniRefiner refines diverse ViTs with minimal fine-tuning, leading to significant improvements in tasks like semantic segmentation. AI

IMPACT Enhances the spatial reasoning capabilities of foundation vision models, potentially broadening their applicability in dense prediction tasks.

DINOv2
ViTs
InternViT-6B
UniRefiner
EVA-CLIP-8B