ToaSt: Token Channel Selection and Structured Pruning for Efficient ViT
Researchers have developed a new framework called ToaSt designed to make Vision Transformers (ViTs) more computationally efficient. ToaSt decouples strategies for different parts of the ViT architecture, applying head-wise structured pruning to attention modules and a training-free method called Token Channel Selection (TCS) to the Feed-Forward Networks. This approach has demonstrated improved accuracy and efficiency trade-offs across various models and downstream tasks, including image classification, detection, and segmentation. AI
IMPACT This research offers a novel approach to reducing the computational cost of Vision Transformers, potentially enabling wider deployment of these models in resource-constrained environments.