Vision Transformers gain interpretability and performance with register tokens

By PulseAugur Editorial · [2 sources] · 2026-06-10 12:58

Researchers have developed a new method using register tokens to improve the interpretability and performance of Vision Transformers (ViTs) for face recognition. By adding learnable register tokens to the initial patch embeddings, the ViT-8R model demonstrates more structured and understandable attention maps compared to standard CLS-token or Concatenated Patch Embeddings (CPE) approaches. This enhancement not only mitigates interpretability artifacts but also achieves state-of-the-art results on large-scale benchmarks like IJB-B and IJB-C. AI

IMPACT Enhances interpretability of ViTs for face recognition, potentially leading to more trustworthy and accurate systems.

RANK_REASON The cluster contains an academic paper detailing a new method and model for face recognition.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

Vision Transformers gain interpretability and performance with register tokens

COVERAGE [2]

arXiv cs.CV TIER_1 English(EN) · Tahar Chettaoui, Guray Ozgur, Eduarda Caldeira, Naser Damer, Fadi Boutros · 2026-06-11 04:00

Vision Transformers for Face Recognition Need More Registers

arXiv:2606.12036v1 Announce Type: new Abstract: Recent advances in Vision Transformers (ViTs) for face recognition (FR) have moved beyond the standard CLS-token paradigm. In this paradigm, a special classification token (CLS) is prepended to the patch embeddings and used as a rep…
arXiv cs.CV TIER_1 English(EN) · Fadi Boutros · 2026-06-10 12:58

Vision Transformers for Face Recognition Need More Registers

Recent advances in Vision Transformers (ViTs) for face recognition (FR) have moved beyond the standard CLS-token paradigm. In this paradigm, a special classification token (CLS) is prepended to the patch embeddings and used as a representation of the input for downstream tasks. A…

COVERAGE [2]

Vision Transformers for Face Recognition Need More Registers

Vision Transformers for Face Recognition Need More Registers

RELATED ENTITIES

RELATED TOPICS