Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 8h

Discriminative Span as a Predictor of Synthetic Data Utility via Classifier Reconstruction

Researchers have developed a new metric to predict the usefulness of synthetic data for computer vision tasks, particularly in scenarios with limited positive samples. This method analyzes the embedding space of a pre-trained foundation model, using difference vectors between samples to assess if synthetic data captures task-relevant directions. The metric's effectiveness was demonstrated through its strong correlation with the performance of CNNs trained on mixed real and synthetic data, offering a practical tool for evaluating synthetic data quality. AI

Hugging Face
arXiv
CNNS
DagsHub
alphaXiv
CatalyzeX
Radhika Desai