A new paper titled "The FID Lottery" investigates the variability of the Fréchet Inception Distance (FID) metric in generative model evaluation. The study found that retraining a model with a different seed can alter FID scores three times more than simply redrawing samples from a fixed model. This variance is attributed to random initialization, data ordering, and flow-matching loss noise. The research suggests a revised FID evaluation protocol that includes optimal guidance per cell, treats FID gaps below a ~1.3% coefficient of variation as inconclusive, and recommends reporting error bars over multiple training seeds instead of a single FID number. AI
IMPACT Highlights potential unreliability in current generative model evaluation, suggesting a need for more robust benchmarking practices.
RANK_REASON The cluster contains an academic paper detailing a new evaluation methodology for generative models. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →