Beyond Model Size: Probing the Gaps in Visual in-Context Learning by Training a Tiny Model
Two new research papers published on arXiv explore the effectiveness of visual in-context learning (VICL). One paper challenges the notion that large models and extensive data are essential for VICL by training a tiny model with only 1 million parameters and 70,000 images. The other paper introduces VIBE, a comprehensive benchmark designed to evaluate VICL models across diverse domains and tasks, highlighting limitations in current adaptation capability assessments. AI
IMPACT Highlights potential for smaller models in visual adaptation and calls for improved benchmarking in the field.