PulseAugur
实时 11:26:39

Researchers analyze Transformer representational collapse and propose new remedies

A new paper analyzes representational collapse in Transformer models, challenging previous findings about the role of MLPs and Layer Normalization. The research clarifies that while Layer Normalization preserves affine rank, residual connections prevent rank collapse without MLPs. The paper also identifies a distinct issue of head-channel non-identifiability in multi-head attention, proposing a position-gated output projection as a partial solution. AI

影响 Provides a more precise understanding of Transformer architecture limitations and potential remedies.

排序理由 Academic paper analyzing Transformer architecture and representational collapse.

在 arXiv stat.ML 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

Researchers analyze Transformer representational collapse and propose new remedies

报道来源 [2]

  1. arXiv stat.ML TIER_1 English(EN) · Giansalvo Cirrincione ·

    Rank, Head-Channel Non-Identifiability, and Symmetry Breaking: A Precise Analysis of Representational Collapse in Transformers

    arXiv:2604.23681v1 Announce Type: cross Abstract: A widely cited result by Dong et al. (2021) showed that Transformers built from self-attention alone, without skip connections or feed-forward layers, suffer from rapid rank collapse: all token representations converge to a single…

  2. arXiv stat.ML TIER_1 English(EN) · Giansalvo Cirrincione ·

    Rank, Head-Channel Non-Identifiability, and Symmetry Breaking: A Precise Analysis of Representational Collapse in Transformers

    A widely cited result by Dong et al. (2021) showed that Transformers built from self-attention alone, without skip connections or feed-forward layers, suffer from rapid rank collapse: all token representations converge to a single direction. The proposed remedy was the MLP. We sh…