Epidemiology of Model Collapse: Modeling Synthetic Data Contamination via Bilayer SIR Dynamics
Researchers have developed a new epidemiological model to understand how synthetic data contamination can degrade AI models. Their bilayer SIR/SIRS framework treats AI models and data corpora as interacting populations, identifying key transmission dynamics. The model suggests that current AI text prevalence could lead to supercritical contamination, emphasizing the importance of detection-based filtering and herd immunity strategies. AI
IMPACT Provides a framework for understanding and mitigating synthetic data's negative impact on AI model quality.