Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 8h

ToaSt: Token Channel Selection and Structured Pruning for Efficient ViT

Researchers have developed a new framework called ToaSt designed to make Vision Transformers (ViTs) more computationally efficient. ToaSt decouples strategies for different parts of the ViT architecture, applying head-wise structured pruning to attention modules and a training-free method called Token Channel Selection (TCS) to the Feed-Forward Networks. This approach has demonstrated improved accuracy and efficiency trade-offs across various models and downstream tasks, including image classification, detection, and segmentation. AI

IMPACT This research offers a novel approach to reducing the computational cost of Vision Transformers, potentially enabling wider deployment of these models in resource-constrained environments.

CIFAR-100
COCO
ADE20K
Swin Transformer
Vision Transformers
ToaSt
ViT-MAE
ViT-MAE-Huge
Cheonjun Park