A new framework for Dataset Usage Inference (DUI) has been proposed, which aims to determine the proportion of a dataset used in training a machine learning model without requiring shadow models or held-out data. This method generates synthetic non-member samples and frames DUI as a mixture proportion estimation problem. Experiments on large image generative models demonstrate its effectiveness in quantifying dataset usage, offering a practical solution for data owners. AI
IMPACT Provides a practical tool for data owners to determine data usage in ML models, potentially impacting data licensing and privacy.
RANK_REASON The cluster contains an academic paper detailing a new method for dataset usage inference. [lever_c_demoted from research: ic=1 ai=1.0]
- arXiv
- Dataset Usage Inference
- held-out data
- Hugging Face
- image generative models
- machine learning
- shadow models
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →