Ultra-tiny Vision Transformer designed for mobile deployment

By PulseAugur Editorial · [1 sources] · 2026-06-16 04:00

Researchers have developed UtVAA, an ultra-tiny Vision Transformer architecture optimized for mobile and edge devices. This new model incorporates Affix Attention, which combines local feature extraction with linear self-attention and coordinate attention for spatial modeling. UtVAA also utilizes Dilated Bottleneck blocks to expand receptive fields efficiently. The smallest variant boasts over 200,000 parameters and 53 million FLOPs, achieving competitive accuracy on benchmark datasets like CIFAR-10 and CIFAR-100, demonstrating that transformer-based vision models can be made significantly smaller without substantial performance loss. AI

RANK_REASON Academic paper detailing a new model architecture for computer vision. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Romiyal George, Sathiyamohan Nishankar, Selvarajah Thuseethan, Roshan G. Ragel · 2026-06-16 04:00

UtVAA: Ultra-tiny Vision Transformer with Affix Attention for Mobile Image Classification

arXiv:2606.14735v1 Announce Type: new Abstract: Vision Transformers (ViTs) have demonstrated strong representation capability in image classification. However, their quadratic self-attention complexity and large parameter counts limit deployment on resource-constrained mobile and…

COVERAGE [1]

UtVAA: Ultra-tiny Vision Transformer with Affix Attention for Mobile Image Classification

RELATED ENTITIES

RELATED TOPICS