bViT uses single-block recurrence for parameter-efficient vision transformers

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-11 14:43

Researchers have developed bViT, a novel Vision Transformer architecture that utilizes a single transformer block applied repeatedly for image recognition. This recurrent approach achieves accuracy comparable to standard ViTs on ImageNet-1K with significantly fewer parameters. The study suggests that a substantial portion of a ViT's depth can be achieved through recurrent computation, especially when the representation space is wide, enabling parameter-efficient fine-tuning for downstream tasks. AI

影响 Introduces a parameter-efficient architecture for vision transformers, potentially reducing computational costs for image recognition tasks.

排序理由 The cluster contains an academic paper detailing a new model architecture. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Alberto Presta · 2026-05-11 14:43

bViT：研究Vision Transformers中用于图像识别的单块循环

Vision Transformers (ViTs) are built by stacking independently parameterized blocks, but it remains unclear how much of this depth requires layer specific transformations and how much can be realized through recurrent computation. We study this question with bViT, a single-block …

报道来源 [1]

bViT：研究Vision Transformers中用于图像识别的单块循环

相关实体

相关话题