AI Research: High-Quality Data Can Harm Small Model Math Reasoning

By PulseAugur Editorial · [1 sources] · 2026-06-16 04:00

A new research paper identifies a "Quality-Utility Paradox" in the process of distilling knowledge from powerful AI models to improve smaller models' mathematical reasoning capabilities. The study found that data refined by a stronger "Oracle" model, while scoring higher on quality metrics, actually leads to worse performance in smaller models compared to data selected through rejection sampling. This occurs because Oracle refinement introduces a distributional drift that increases the adaptation cost for the smaller model. To address this, the researchers propose "Style-Aligned Refinement," a method that balances logical repair with compatibility to the smaller model's native reasoning distribution, thereby improving utility. AI

IMPACT Suggests that current methods for improving small model reasoning may be counterproductive, requiring a re-evaluation of data refinement strategies.

RANK_REASON Research paper published on arXiv detailing a new finding in AI model training. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Haolong Qian, Xianliang Yang, Yinuo ma, Lirong Che, Feng Lu, Ye Guo, Lei Song, Jiang Bian, Chun Yuan · 2026-06-16 04:00

The Quality-Utility Paradox: Why High-Reward Data Impairs Small Model Mathematical Reasoning

arXiv:2606.16152v1 Announce Type: new Abstract: Knowledge distillation from powerful reasoning models is widely used to improve Small Language Models (SLMs) on mathematical reasoning, often assuming that traces with higher reward model scores provide more useful supervision. We i…

COVERAGE [1]

The Quality-Utility Paradox: Why High-Reward Data Impairs Small Model Mathematical Reasoning

RELATED ENTITIES

RELATED TOPICS