Vision Transformers leverage DCT for improved attention and efficiency

By PulseAugur Editorial · [1 sources] · 2026-05-04 04:00

Researchers have developed a novel approach using the Discrete Cosine Transform (DCT) to enhance Vision Transformers. This method includes a DCT-based initialization strategy for self-attention, which improves classification accuracy on benchmarks like CIFAR-10 and ImageNet-1K. Additionally, a DCT-based attention compression technique reduces computational overhead by truncating high-frequency components of input patches, maintaining performance in models like the Swin Transformer. AI

IMPACT Introduces methods to reduce computational costs and improve accuracy in Vision Transformers, potentially enabling wider deployment.

RANK_REASON Academic paper introducing novel techniques for improving Vision Transformer efficiency and performance.

Read on arXiv cs.CV →

paper
infra

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Ahmet Enis Cetin, Ulas Bagci · 2026-05-04 04:00

Discrete Cosine Transform Based Decorrelated Attention for Vision Transformers

arXiv:2405.13901v4 Announce Type: replace Abstract: Self-attention is central to the success of Transformer architectures; however, learning the query, key, and value projections from random initialization remains challenging and computationally expensive. In this paper, we propo…

COVERAGE [1]

Discrete Cosine Transform Based Decorrelated Attention for Vision Transformers

RELATED ENTITIES

RELATED TOPICS