Discriminative Span as a Predictor of Synthetic Data Utility via Classifier Reconstruction
Researchers have developed a new metric to predict the usefulness of synthetic data for computer vision tasks, particularly in scenarios with limited positive samples. This method analyzes the embedding space of a pre-trained foundation model, using difference vectors between samples to assess if synthetic data captures task-relevant directions. The metric's effectiveness was demonstrated through its strong correlation with the performance of CNNs trained on mixed real and synthetic data, offering a practical tool for evaluating synthetic data quality. AI