Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 7h

The Quality-Utility Paradox: Why High-Reward Data Impairs Small Model Mathematical Reasoning

A new research paper identifies a "Quality-Utility Paradox" in the process of distilling knowledge from powerful AI models to improve smaller models' mathematical reasoning capabilities. The study found that data refined by a stronger "Oracle" model, while scoring higher on quality metrics, actually leads to worse performance in smaller models compared to data selected through rejection sampling. This occurs because Oracle refinement introduces a distributional drift that increases the adaptation cost for the smaller model. To address this, the researchers propose "Style-Aligned Refinement," a method that balances logical repair with compatibility to the smaller model's native reasoning distribution, thereby improving utility. AI

IMPACT Suggests that current methods for improving small model reasoning may be counterproductive, requiring a re-evaluation of data refinement strategies.