RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories
Researchers have introduced RegMix-D, an advancement over the RegMix method for selecting data mixtures in large language model pretraining. RegMix-D leverages the full loss trajectories from proxy runs, rather than just endpoint losses, to dynamically adjust data mixtures throughout the training process. This approach, which can operate offline or online, has demonstrated consistent improvements over existing methods like RegMix and DoReMi across 13 downstream tasks, even with a significantly reduced proxy compute budget. AI
IMPACT This method could lead to more efficient and effective LLM training by optimizing data mixture selection.