ENTITY FineWeb-Edu

FineWeb-Edu

PulseAugur coverage of FineWeb-Edu — every cluster mentioning FineWeb-Edu across labs, papers, and developer communities, ranked by signal.

Show in brief

Total · 30d

11 over 90d

Releases · 30d

0 over 90d

Papers · 30d

11 over 90d

TIER MIX · 90D

TOPICS

SENTIMENT · 30D

4 day(s) with sentiment data

RECENT · PAGE 1/1 · 11 TOTAL

TOOL · CL_104732 · Jun 20 · 18:42

Small language model trained on single GPU detailed in new study

Researchers have detailed a method for training a small language model, L20-Edu-135M, using significantly fewer computational resources, specifically on a single NVIDIA L20 GPU. The study focused on data efficiency, uti…
RESEARCH · CL_97829 · Jun 17 · 15:11

New pretraining method enhances LLM safety with integrated reflection

Researchers have introduced a new method called Safety Reflection Pretraining, designed to enhance the safety alignment of large language models (LLMs) during the pretraining phase. This approach goes beyond simply filt…
TOOL · CL_84918 · Jun 11 · 04:00

EverydayGPT uses confidence gating to cut RAG latency by 120x

Researchers have developed EverydayGPT, a conversational question-answering system that uses a Confidence-Gated Routing (CGR) mechanism to improve efficiency. This system routes queries based on retrieval distance and e…
TOOL · CL_84812 · Jun 11 · 04:00

SoftMatcha 2 enables trillion-token search in under 0.3 seconds

Researchers have developed SoftMatcha 2, a novel algorithm designed for rapid and semantically flexible pattern matching across massive text datasets. This system can search through trillions of tokens in under a second…
TOOL · CL_65808 · Jun 2 · 04:00

Child-directed speech aids AI language production, not comprehension

A new research paper explores how child-directed speech (CDS) impacts language models, specifically focusing on production capabilities rather than just comprehension. The study found that models trained on CDS demonstr…
TOOL · CL_58840 · May 29 · 04:00

Kronecker Embeddings slash language model parameters, boost performance

Researchers have developed Kronecker Embeddings, a novel method for representing tokens in language models that significantly reduces the number of trainable parameters. This approach replaces large embedding tables wit…
TOOL · CL_51343 · May 26 · 04:00

New Interdomain Attention Merges Transformers and SSMs

Researchers have introduced Interdomain Attention, a novel mechanism that merges the strengths of Transformers and deep state space models (SSMs). This new approach integrates an SSM into an attention module using kerne…
RESEARCH · CL_28256 · May 11 · 16:26

Muown optimizer improves LLM training by controlling row-norm drift

Researchers have developed Muown, a novel optimization method designed to improve the training of large language models. Muown addresses issues with the Muon optimizer, specifically the upward drift of spectral norms in…
TOOL · CL_25579 · May 8 · 14:47

OrScale optimization method improves neural network training

Researchers have introduced OrScale, a novel optimization technique designed to enhance neural network training. OrScale builds upon the Muon method by incorporating layer-wise trust-ratio scaling, which measures the Fr…
TOOL · CL_15985 · May 5 · 04:00

Researchers explore growing Transformers with modular composition and layer-wise expansion

Researchers have explored a method for training Transformer models by incrementally adding new layers to a frozen base, maintaining a constant budget for trainable parameters. This approach, termed 'Growing Transformers…
RESEARCH · CL_14902 · May 4 · 19:11

OpenMythos project reconstructs Anthropic's secretive Claude Mythos AI model

A new open-source project called OpenMythos has been released, aiming to theoretically reconstruct the architecture of Anthropic's Claude Mythos model. This project implements a Recurrent-Depth Transformer (RDT) with a …

Small language model trained on single GPU detailed in new study

New pretraining method enhances LLM safety with integrated reflection

EverydayGPT uses confidence gating to cut RAG latency by 120x

SoftMatcha 2 enables trillion-token search in under 0.3 seconds

Child-directed speech aids AI language production, not comprehension

Kronecker Embeddings slash language model parameters, boost performance

New Interdomain Attention Merges Transformers and SSMs

Muown optimizer improves LLM training by controlling row-norm drift

OrScale optimization method improves neural network training

Researchers explore growing Transformers with modular composition and layer-wise expansion

OpenMythos project reconstructs Anthropic's secretive Claude Mythos AI model