PulseAugur
EN
LIVE 11:43:11

New method boosts decoder-only model classification with unlabeled data

Researchers have developed DecSelfMask, a novel method to improve classification performance in decoder-only language models using unlabeled data. This approach employs a relevance-guided masking strategy, identifying crucial text segments and training the model to reconstruct them. DecSelfMask demonstrated significant gains, outperforming standard supervised fine-tuning by nearly 20 points in Macro F1 on a dataset of 1.9 million clinical notes. AI

IMPACT Enhances classification capabilities of decoder-only models, potentially reducing reliance on expensive labeled data in specialized domains.

RANK_REASON The cluster contains a research paper detailing a new method for improving language model performance.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Pietro Ferrazzi, Matteo Merler, Giovanni Bonetta, Alberto Lavelli, Bernardo Magnini ·

    DECSELFMASK: Leveraging Unlabeled Text via Self-Relevance-Guided Masking for Decoder-Only Classification

    arXiv:2606.09466v2 Announce Type: replace Abstract: Classification tasks require annotated data, which can often be expensive, time-consuming, or even unfeasible to collect. This is the case of the medical domain, where large datasets often have few annotated examples. To address…

  2. arXiv cs.CL TIER_1 English(EN) · Bernardo Magnini ·

    DECSELFMASK: Leveraging Unlabeled Text via Self-Relevance-Guided Masking for Decoder-Only Classification

    Classification tasks require annotated data, which can often be expensive, time-consuming, or even unfeasible to collect. This is the case of the medical domain, where large datasets often have few annotated examples. To address this, we propose DecSelfMask (Decoder Self-learning…