Vision Transformers and CNNs Compared for Land Use Classification

By PulseAugur Editorial · [1 sources] · 2026-05-20 14:57

A new research paper compares the effectiveness of Vision Transformers (ViTs) and Convolutional Neural Networks (CNNs) for land use scene classification using remote sensing imagery. The study evaluated AlexNet and ViT on the UC Merced Land Use and EuroSAT datasets, analyzing metrics like accuracy, precision, recall, and F1-score. Results indicate that CNNs are more robust with limited data and strong local textures, while ViTs excel at capturing global spatial relationships with sufficient training data, though they require more computational resources. AI

IMPACT Provides insights for selecting appropriate deep learning models for remote sensing land use classification tasks.

RANK_REASON Academic paper presenting a comparative analysis of two deep learning architectures for a specific task. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Vision Transformers and CNNs Compared for Land Use Classification

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Arun D. Kulkarni · 2026-05-20 14:57

Vision Transformers and Convolutional Neural Networks for Land Use Scene Classification

Land Use Scene Classification (LUSC) from remote sensing imagery plays a critical role in environmental monitoring, urban planning, and sustainable resource management. In recent years, deep learning methods have significantly advanced the state of the art, with Convolutional Neu…

COVERAGE [1]

Vision Transformers and Convolutional Neural Networks for Land Use Scene Classification

RELATED ENTITIES

RELATED TOPICS