PulseAugur
EN
LIVE 13:09:49

RATS! New Transformer Architecture Discovers Object Parts in Vision Models

Researchers have introduced RATS (Register Attention Transformers), a novel architecture for self-supervised visual models designed to discover compositional structure akin to human object part recognition. RATS utilizes learnable register tokens to route patch information through a bottleneck, with registers specializing into proto-semantic regions without explicit part annotations. This approach demonstrates superior performance on segmentation benchmarks, outperforming baselines by an average of +12 mIoU and showing consistent gains on datasets like ADE20K and COCO. AI

IMPACT Introduces a novel architectural prior for structured and interpretable visual representation learning, potentially improving object recognition and segmentation.

RANK_REASON The cluster contains a research paper detailing a new model architecture.

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CV TIER_1 Deutsch(DE) · Timing Yang, Predrag Neskovic, Jansen Seheult, Wenchao Han, Anand Bhattad, Alan Yuille, Feng Wang ·

    RATS! Patches Talk Through Registers: Emergent Parts in Register Attention Transformers

    arXiv:2606.14701v1 Announce Type: new Abstract: When humans see a bird, they recognize far more than just "bird" -- they see a head, wings, and talons, a structured assembly of reusable parts that can be identified across every bird they have ever seen. We ask whether a self-supe…

  2. arXiv cs.CV TIER_1 Deutsch(DE) · Feng Wang ·

    RATS! Patches Talk Through Registers: Emergent Parts in Register Attention Transformers

    When humans see a bird, they recognize far more than just "bird" -- they see a head, wings, and talons, a structured assembly of reusable parts that can be identified across every bird they have ever seen. We ask whether a self-supervised visual model can discover the same compos…