DECSELFMASK: Leveraging Unlabeled Text via Self-Relevance-Guided Masking for Decoder-Only Classification
Researchers have developed DecSelfMask, a novel method to improve classification performance in decoder-only language models using unlabeled data. This approach employs a relevance-guided masking strategy, identifying crucial text segments and training the model to reconstruct them. DecSelfMask demonstrated significant gains, outperforming standard supervised fine-tuning by nearly 20 points in Macro F1 on a dataset of 1.9 million clinical notes. AI
IMPACT Enhances classification capabilities of decoder-only models, potentially reducing reliance on expensive labeled data in specialized domains.