New research reveals significant randomness in FID scores for generative models

By PulseAugur Editorial · [1 sources] · 2026-06-18 17:49

A new paper titled "The FID Lottery" investigates the variability of the Fréchet Inception Distance (FID) metric in generative model evaluation. The study found that retraining a model with a different seed can alter FID scores three times more than simply redrawing samples from a fixed model. This variance is attributed to random initialization, data ordering, and flow-matching loss noise. The research suggests a revised FID evaluation protocol that includes optimal guidance per cell, treats FID gaps below a ~1.3% coefficient of variation as inconclusive, and recommends reporting error bars over multiple training seeds instead of a single FID number. AI

IMPACT Highlights potential unreliability in current generative model evaluation, suggesting a need for more robust benchmarking practices.

RANK_REASON The cluster contains an academic paper detailing a new evaluation methodology for generative models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New research reveals significant randomness in FID scores for generative models

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Patrick Pérez · 2026-06-18 17:49

The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation

The Frechet Inception Distance (FID) is the de facto arbiter of image generation, yet most papers report just a single number from a single trained model using a single sampling seed. How reproducible is that number if we retrain the model, or merely resample from it? In this pap…

COVERAGE [1]

The FID Lottery: Quantifying Hidden Randomness in Generative-Model Evaluation

RELATED ENTITIES

RELATED TOPICS