PulseAugur
EN
LIVE 22:04:00

New framework unifies data selection via sequential decision-making

Researchers have developed a new framework that treats data selection as a sequential decision-making problem, utilizing dynamic programming to find optimal selection sequences. This approach unifies existing methods like Data Shapley, revealing them as approximations to the sequential problem and analyzing how selection optimality degrades with utility curvature. The proposed bipartite graph-based surrogate offers scalable greedy selection with provable guarantees, demonstrating significant improvements in experiments for classical ML benchmarks and large-scale LLM fine-tuning data. AI

IMPACT Provides a theoretical foundation for optimizing data selection in ML and LLM fine-tuning.

RANK_REASON Academic paper detailing a new theoretical framework and experimental results. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New framework unifies data selection via sequential decision-making

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Hongliang Chi, Qiong Wu, Zhengyi Zhou, Jonathan Light, Emily Dodwell, Yao Ma ·

    Unifying and Optimizing Data Values for Selection via Sequential Decision-Making

    arXiv:2502.04554v2 Announce Type: replace Abstract: Data selection has emerged as a crucial downstream application of data valuation, yet the theoretical foundations for using data values in selection remain underexplored. We reformulate data selection as a sequential decision-ma…