A new research paper published on arXiv highlights significant inconsistencies in how Jensen-Shannon divergence is estimated for synthetic tabular data. The study reveals that different estimation protocols can lead to non-comparable divergence values, with marginal-based estimators often underestimating divergence by ignoring dependencies, while classifier-based estimators capture joint structure but are sensitive to the specific estimator used. The researchers propose a posterior correction for classifier-based estimation and offer practical guidelines and an open-source tool to address these protocol dependencies for more meaningful comparisons. AI
IMPACT Highlights critical issues in evaluating synthetic data quality, impacting model development and benchmarking.
RANK_REASON Research paper published on arXiv detailing a technical finding about data divergence estimation. [lever_c_demoted from research: ic=1 ai=1.0]
- alphaXiv
- arXiv
- CatalyzeX
- DagsHub
- Gotit.pub
- Hugging Face
- IArxiv
- Influence Flower
- Jensen-Shannon divergence
- ScienceCast
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →