PulseAugur
LIVE 13:09:24
research · [1 source] ·
0
research

Smol AI releases DataComp-LM, a 7B model with open data and benchmarks

A new 7-billion parameter language model called DataComp-LM has been released, which is notable for being trained on exclusively open-source data. This model also comes with a new benchmark and dataset designed to facilitate further research and development in the field of open-access AI. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

RANK_REASON Release of a new open-source model, benchmark, and dataset.

Read on Smol AINews →

COVERAGE [1]

  1. Smol AINews TIER_1 Bahasa(ID) ·

    DataComp-LM: the best open-data 7B model/benchmark/dataset

    **DataComp team** released a competitive **7B open data language model** trained on only **2.5T tokens** from the massive **DCLM-POOL dataset** of **240 trillion tokens**, showing superior scaling trends compared to FineWeb. **OpenAI** launched **GPT-4o mini**, a cost-effective m…