PulseAugur
实时 09:25:40

New methods offer efficient data valuation for LLMs and VLMs

Two new research papers propose novel methods for data valuation in large language models (LLMs). The first, "For-Value," introduces an efficient forward-only framework that estimates data value using a single forward pass, avoiding computationally expensive backpropagation. The second paper, "Utility-Aware Data Pricing," presents a dynamic, utility-based pricing model that quantifies data's contribution at the token level, incorporating empirical training gains and cryptographic verifiability for a transparent data market. AI

影响 New data valuation techniques could enable more efficient LLM training and fairer data markets by accurately pricing data based on its utility.

排序理由 Two academic papers published on arXiv introduce new methodologies for data valuation in LLMs.

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

New methods offer efficient data valuation for LLMs and VLMs

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Wenlong Deng, Qi Zeng, Jiaming Zhang, Minghui Chen, Zixin Ding, Christos Thrampoulidis, Boying Gong, Xiaoxiao Li ·

    For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs

    arXiv:2508.10180v3 Announce Type: replace Abstract: Data valuation is essential for enhancing the transparency and accountability of large language models (LLMs) and vision-language models (VLMs). However, existing methods typically rely on gradient computations, making them comp…

  2. arXiv cs.LG TIER_1 English(EN) · Minghui Xu, Qi Luo, Kun Li ·

    Utility-Aware Data Pricing: Token-Level Quality and Empirical Training Gain for LLMs

    arXiv:2604.22893v1 Announce Type: new Abstract: Traditional data valuation methods based on `"row-count $\times$ quality coefficient'' paradigms fail to capture the nuanced, nonlinear contributions that data makes to Large Language Model (LLM) capabilities. This paper presents a …