Researchers have developed a new framework that treats data selection as a sequential decision-making problem, utilizing dynamic programming to find optimal selection sequences. This approach unifies existing methods like Data Shapley, revealing them as approximations to the sequential problem and analyzing how selection optimality degrades with utility curvature. The proposed bipartite graph-based surrogate offers scalable greedy selection with provable guarantees, demonstrating significant improvements in experiments for classical ML benchmarks and large-scale LLM fine-tuning data. AI
IMPACT Provides a theoretical foundation for optimizing data selection in ML and LLM fine-tuning.
RANK_REASON Academic paper detailing a new theoretical framework and experimental results. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →