Brief

last 24h

[2/2] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

RESEARCH · arXiv cs.CL English(EN) · 1d · [2 sources]

RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

Researchers have introduced RegMix-D, an advancement over the RegMix method for selecting data mixtures in large language model pretraining. RegMix-D leverages the full loss trajectories from proxy runs, rather than just endpoint losses, to dynamically adjust data mixtures throughout the training process. This approach, which can operate offline or online, has demonstrated consistent improvements over existing methods like RegMix and DoReMi across 13 downstream tasks, even with a significantly reduced proxy compute budget. AI

IMPACT This method could lead to more efficient and effective LLM training by optimizing data mixture selection.
- RegMix-D
- RegMix
- Doremi
- Pile dataset
- Hugging Face
- arXiv
TOOL · arXiv cs.AI English(EN) · 3w

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation

Researchers have introduced GEM (Geometric Entropy Mixing), a novel framework for optimizing Large Language Model (LLM) data curation. GEM reformulates data mixing as a variational problem on a hypersphere, employing a mixing-balance regularizer to overcome limitations of existing categorization methods like human taxonomies and Euclidean clustering. The framework utilizes a provable Minorize-Maximize algorithm to discover balanced semantic structures and has demonstrated improvements of up to 1.2% in average downstream accuracy when integrated with existing mixing strategies. AI

IMPACT This new geometric approach to data curation could lead to more efficient and effective LLM training, potentially improving model performance on downstream tasks.

Brief

RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

GEM: Geometric Entropy Mixing for Optimal LLM Data Curation