PulseAugur
实时 14:03:08
English(EN) ViT-Up: Faithful Feature Upsampling for Vision Transformers

ViT-Up 框架增强 Vision Transformer 特征上采样

研究人员开发了 ViT-Up,一个用于改进 Vision Transformer (ViTs) 中特征上采样的新框架。与依赖外部图像引导的先前方法不同,ViT-Up 使用中间 ViT 隐藏状态来构建查询,从而能够在任意坐标处进行特征预测,同时保持与骨干特征的对齐。这种方法旨在克服 ViTs 在密集预测任务中因在大网格上计算成本高而带来的局限性。 AI

影响 ViT-Up 的特征上采样方法可以提高 Vision Transformer 在密集预测任务上的性能。

排序理由 该集群包含一篇详细介绍改进 Vision Transformer 特征上采样新方法的论文。

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

ViT-Up 框架增强 Vision Transformer 特征上采样

报道来源 [2]

  1. arXiv cs.CV TIER_1 English(EN) · Krispin Wandel, Jingchuan Wang, Hesheng Wang ·

    ViT-Up: Vision Transformer 的忠实特征上采样

    arXiv:2606.14024v1 Announce Type: new Abstract: Vision Transformers (ViTs) have become a dominant architecture for visual representation learning, providing exceptionally strong and broadly reusable backbone features. However, ViTs are commonly operated on relatively small patch-…

  2. arXiv cs.CV TIER_1 English(EN) · Hesheng Wang ·

    ViT-Up:Vision Transformers 的忠实特征上采样

    Vision Transformers (ViTs) have become a dominant architecture for visual representation learning, providing exceptionally strong and broadly reusable backbone features. However, ViTs are commonly operated on relatively small patch-token grids due to the quadratic cost of global …