Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 8h

Spatial Priors via Space Filling Curves for Small and Limited Data Vision Transformers

Researchers have developed VIOLIN, a novel masked attention mechanism for Vision Transformers (ViTs) that enhances their ability to process images with limited data or smaller model capacities. By encoding spatial structure through Space Filling Curves (SFCs), VIOLIN adds minimal parameters and computational overhead while significantly improving performance across various computer vision tasks. Evaluations show accuracy boosts of up to 8.7% on tasks requiring spatial information and up to 7.2% on pixel-level tasks, demonstrating its effectiveness in both fine-tuning and pre-training scenarios. AI

IMPACT Enhances Vision Transformer performance on limited data, potentially broadening their applicability in resource-constrained environments.

LoRA
CIFAR-100
VTAB-1K
Vision Transformers
ImageNet-1K
VIOLIN
Space Filling Curves
Leyla Naz Candogan