Self-pretraining boosts Transformer sequence classification accuracy

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-20 11:56

Researchers have investigated the effectiveness of self-pretraining (SPT) for Transformer models in sequence classification tasks. Their work replicates and ablates previous findings, suggesting that SPT improves optimization by enabling models to learn useful attention patterns. Specifically, the study highlights that SPT helps models learn proximity interactions, transforming absolute positional encodings into attention scores that bias towards nearby elements. This approach proves more effective than standard supervised training in certain Transformer configurations, as label supervision can overlook beneficial attention directions that masked reconstruction can detect. AI

影响 Enhances Transformer performance on sequence classification by improving attention mechanisms and overcoming limitations of standard supervised training.

排序理由 Academic paper detailing a novel training technique for sequence classification models. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Antonio Orvieto · 2026-05-20 11:56

Towards Understanding Self-Pretraining for Sequence Classification

Amos et al. (2024) showed that the accuracy of Transformer models in sequence classification can be significantly improved by first pretraining with a masked token prediction objective without external data or augmentation, a procedure referred to as self-pretraining (SPT). While…

报道来源 [1]

Towards Understanding Self-Pretraining for Sequence Classification

相关实体

相关话题