Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 6h

DRIFT: Refining Instruction Data via On-Policy Data Attribution

Researchers have developed DRIFT, a novel method for refining instruction data to improve the performance ceiling of large language models. Unlike existing data curation techniques that focus on subset selection, DRIFT aims to enhance the data distribution itself. It utilizes on-policy influence functions, leveraging the model's own rollouts as validation targets to address limitations like proximity gaps and gradient norm bias found in standard influence function formulations. Experiments with 7B-parameter models demonstrate that DRIFT effectively raises performance on instruction and reasoning tasks, outperforming current data curation baselines. AI

IMPACT This research could lead to more capable LLMs by improving the efficiency and effectiveness of training data curation.

Hugging Face
arXiv
large-language models
Supervised Fine-Tuning
Influence Functions
DRIFT
7B-parameter