ENTITY dolma

dolma

PulseAugur coverage of dolma — every cluster mentioning dolma across labs, papers, and developer communities, ranked by signal.

Total · 30d

3

3 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

3

3 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

1 day(s) with sentiment data

RECENT · PAGE 1/1 · 3 TOTAL

TOOL · CL_106699 · Jun 17 · 00:00

New framework analyzes narrative structures in LLM pretraining data

Researchers have developed a new framework and model, NarraBERT, to analyze narrative structures within large language model (LLM) pretraining data. This analysis, applied to the 3-trillion-token Dolma corpus, reveals m…
RESEARCH · CL_98078 · Jun 17 · 00:00

New framework analyzes narrative structure in LLM pretraining data · 4 sources tracked

Researchers have developed a new framework and model, NarraBERT, to analyze narrative structures within large language model (LLM) pretraining data. The study applied this framework to the 3-trillion-token Dolma corpus,…
TOOL · CL_29413 · May 12 · 16:45

LLM popularity bias driven by pretraining data exposure, study finds

Researchers have analyzed how large language models (LLMs) develop preferences for well-known entities, a phenomenon often linked to popularity bias. Using the open OLMo models and their complete Dolma pretraining corpu…