The SuperActivator Mechanism: Transformers Concentrate Reliable Concept Signals in the Tail
Researchers have identified a "SuperActivator Mechanism" in transformers that concentrates reliable concept signals into a small subset of high-activation tokens. This mechanism amplifies concept activation gaps, creating a distinct positive tail in the in-concept distribution that is separate from noise. This discovery leads to more accurate concept detection, improving F1 scores by up to 0.14 across various models and modalities. AI
IMPACT Identifies a mechanism for more reliable concept detection in transformers, potentially improving interpretability and downstream applications.