Researchers have developed a novel approach using the Discrete Cosine Transform (DCT) to enhance Vision Transformers. This method includes a DCT-based initialization strategy for self-attention, which improves classification accuracy on benchmarks like CIFAR-10 and ImageNet-1K. Additionally, a DCT-based attention compression technique reduces computational overhead by truncating high-frequency components of input patches, maintaining performance in models like the Swin Transformer. AI
影响 Introduces methods to reduce computational costs and improve accuracy in Vision Transformers, potentially enabling wider deployment.
排序理由 Academic paper introducing novel techniques for improving Vision Transformer efficiency and performance.
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →