ClustViT paper introduces token merging for efficient semantic segmentation

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-04 04:00

Researchers have introduced ClustViT, a novel approach to enhance Vision Transformers for semantic segmentation tasks. This method employs a trainable Cluster module to merge similar tokens, guided by segmentation masks, thereby reducing computational complexity. A subsequent Regenerator module restores fine details, enabling faster inference and fewer GFLOPs with comparable accuracy on various datasets. AI

影响 Reduces computational cost for semantic segmentation models, potentially enabling wider use in resource-constrained environments like robotics.

排序理由 This is a research paper detailing a new method for improving Vision Transformers for semantic segmentation.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Fabio Montello, Ronja G\"uldenring, Lazaros Nalpantidis · 2026-05-04 04:00

ClustViT: Clustering-based Token Merging for Semantic Segmentation

arXiv:2510.01948v2 Announce Type: replace Abstract: Vision Transformers can achieve high accuracy and strong generalization across various contexts, but their practical applicability on real-world robotic systems is limited due to their quadratic attention complexity. Recent work…

报道来源 [1]

ClustViT: Clustering-based Token Merging for Semantic Segmentation

相关实体

相关话题