A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective
Researchers have developed a data-centric approach to study memorization in tabular diffusion models, identifying that a small subset of training samples disproportionately contributes to privacy risks. They found that these highly memorized samples are identified earlier in the training process. To mitigate this, they propose DynamicCut, a method that prunes these high-intensity samples before retraining, which effectively reduces memorization without significantly impacting data diversity or downstream task performance. AI
IMPACT Offers a new technique to enhance privacy in generative models for tabular data, potentially improving trust and adoption.