PulseAugur / Brief
EN
LIVE 13:14:49

Brief

last 24h
[1/1] 222 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. BLISS: A Lightweight Bilevel Influence Scoring Method for Data Selection in Language Model Pretraining

    Researchers have developed BLISS, a novel method for selecting data to pretrain large language models more efficiently. Unlike previous methods, BLISS does not require external pretrained models and accounts for the long-term impact of data by using a proxy model and a score model. This bilevel optimization approach allows BLISS to predict influence scores for training samples, enabling the selection of high-quality data. Experiments with Pythia and LLaMA models showed that BLISS achieved a 1.7x speedup in reaching target performance compared to state-of-the-art methods. AI

    IMPACT Enables faster and more efficient pretraining of large language models by optimizing data selection.