PulseAugur
实时 23:57:41

Vision Transformers leverage DCT for improved attention and efficiency

Researchers have developed a novel approach using the Discrete Cosine Transform (DCT) to enhance Vision Transformers. This method includes a DCT-based initialization strategy for self-attention, which improves classification accuracy on benchmarks like CIFAR-10 and ImageNet-1K. Additionally, a DCT-based attention compression technique reduces computational overhead by truncating high-frequency components of input patches, maintaining performance in models like the Swin Transformer. AI

影响 Introduces methods to reduce computational costs and improve accuracy in Vision Transformers, potentially enabling wider deployment.

排序理由 Academic paper introducing novel techniques for improving Vision Transformer efficiency and performance.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

Vision Transformers leverage DCT for improved attention and efficiency

报道来源 [1]

  1. arXiv cs.CV TIER_1 English(EN) · Hongyi Pan, Emadeldeen Hamdan, Xin Zhu, Ahmet Enis Cetin, Ulas Bagci ·

    Discrete Cosine Transform Based Decorrelated Attention for Vision Transformers

    arXiv:2405.13901v4 Announce Type: replace Abstract: Self-attention is central to the success of Transformer architectures; however, learning the query, key, and value projections from random initialization remains challenging and computationally expensive. In this paper, we propo…