New DUI method quantifies dataset usage without shadow models

By PulseAugur Editorial · [1 sources] · 2026-06-26 04:00

A new framework for Dataset Usage Inference (DUI) has been proposed, which aims to determine the proportion of a dataset used in training a machine learning model without requiring shadow models or held-out data. This method generates synthetic non-member samples and frames DUI as a mixture proportion estimation problem. Experiments on large image generative models demonstrate its effectiveness in quantifying dataset usage, offering a practical solution for data owners. AI

IMPACT Provides a practical tool for data owners to determine data usage in ML models, potentially impacting data licensing and privacy.

RANK_REASON The cluster contains an academic paper detailing a new method for dataset usage inference. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.LG →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New DUI method quantifies dataset usage without shadow models

COVERAGE [1]

arXiv cs.LG TIER_1 English(EN) · Wojciech {\L}apacz, Stanis{\l}aw Pawlak, Jan Dubi\'nski, Franziska Boenisch, Adam Dziedzic · 2026-06-26 04:00

Dataset Usage Inference without Shadow Models or Held-out Data

arXiv:2606.26257v1 Announce Type: new Abstract: How much of my data was used to train a machine learning model? Dataset Usage Inference (DUI) aims to answer this by estimating what fraction of a dataset contributed to a model's training. However, existing DUI methods rely on assu…

COVERAGE [1]

Dataset Usage Inference without Shadow Models or Held-out Data

RELATED ENTITIES

RELATED TOPICS