PulseAugur
EN
LIVE 08:54:30

RegMix-D advances LLM pretraining with dynamic data mixing

Researchers have introduced RegMix-D, an advancement over the RegMix method for selecting data mixtures in large language model pretraining. RegMix-D leverages the full loss trajectories from proxy runs, rather than just endpoint losses, to dynamically adjust data mixtures throughout the training process. This approach, which can operate offline or online, has demonstrated consistent improvements over existing methods like RegMix and DoReMi across 13 downstream tasks, even with a significantly reduced proxy compute budget. AI

IMPACT This method could lead to more efficient and effective LLM training by optimizing data mixture selection.

RANK_REASON The cluster describes a new method presented in an arXiv paper for improving LLM pretraining.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Kaiyan Zhao, Zhongtao Miao, Akiko Aizawa, Yoshimasa Tsuruoka ·

    RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

    arXiv:2606.18663v1 Announce Type: new Abstract: Data mixture selection is critical for Large Language Model pretraining. Existing methods such as RegMix select a single static mixture by fitting a regression model on small-scale proxy runs. We propose RegMix-D, a simple extension…

  2. arXiv cs.CL TIER_1 English(EN) · Yoshimasa Tsuruoka ·

    RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

    Data mixture selection is critical for Large Language Model pretraining. Existing methods such as RegMix select a single static mixture by fitting a regression model on small-scale proxy runs. We propose RegMix-D, a simple extension of RegMix to dynamic mixing. Our key observatio…