PulseAugur
EN
LIVE 04:20:21
中文(ZH) ICML 2026 REViT 发布 | 这可能是这个 Transformer 时代,CNN最后的体面

REViT imbues Vision Transformers with rotation equivariance without position encoding

Researchers have developed REViT, a novel approach that imbues Vision Transformers (ViTs) with rotation and reflection equivariance without relying on complex position encodings. By utilizing a 'Lifting' layer and Group Convolutional Self-Attention (G-CSA), REViT processes input images in a higher-dimensional space that inherently captures directional information. This method significantly outperforms traditional methods and standard ViTs on various datasets, demonstrating superior accuracy and efficiency. AI

IMPACT This research could lead to more robust AI models in areas like medical imaging and autonomous driving by improving their handling of spatial variations.

RANK_REASON The item describes a new research paper proposing a novel method for Vision Transformers. [lever_c_demoted from research: ic=1 ai=1.0]

Read on 雷峰网 (Leiphone) →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

REViT imbues Vision Transformers with rotation equivariance without position encoding

COVERAGE [1]

  1. 雷峰网 (Leiphone) TIER_1 中文(ZH) ·

    ICML 2026 REViT Released | This Might Be the Last Dignity of CNNs in the Transformer Era

    <section><p><span><br /></span></p><p>原文作者:公众号“集智实验室”</p><p>原文链接:https://mp.weixin.qq.com/s/A55BBhD3e_s3VVC7mw1JNw</p><p>雷峰网转载</p><p><br /></p><figure style="margin: 0 8px; text-align: center;"><span><img class="rich_pages wxw-img" src="https://static.leiphone.com/uploads/new/ima…