Researchers have developed UtVAA, an ultra-tiny Vision Transformer architecture optimized for mobile and edge devices. This new model incorporates Affix Attention, which combines local feature extraction with linear self-attention and coordinate attention for spatial modeling. UtVAA also utilizes Dilated Bottleneck blocks to expand receptive fields efficiently. The smallest variant boasts over 200,000 parameters and 53 million FLOPs, achieving competitive accuracy on benchmark datasets like CIFAR-10 and CIFAR-100, demonstrating that transformer-based vision models can be made significantly smaller without substantial performance loss. AI
RANK_REASON Academic paper detailing a new model architecture for computer vision. [lever_c_demoted from research: ic=1 ai=1.0]
- Affix Attention
- CIFAR-10
- CIFAR-100
- Dilated Bottleneck
- PlantVillage-Tomato
- Selvarajah Thuseethan
- SLIF-Tomato
- UtVAA
- Vision Transformers
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →