Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 8h

Vision-Encoder Behavioral Fingerprints of Image-to-Image Generative Models: A Training-Paradigm-Driven Taxonomy of Six Commercial APIs

A new research paper introduces a method to classify image-to-image generative models based on their training paradigms. By analyzing the behavioral fingerprints of six commercial APIs, including GPT-image-1, Gemini 2.5 Flash Image, and SDXL img2img, the study found that models trained with an edit-based approach cluster separately from those adapted at sampling time (text-to-image base models). This classification was achieved using a content-adaptive adversarial perturbation pipeline and scoring outputs against clean references with a frozen DINOv2 ViT-B/14 token distance. AI

IMPACT This research provides a novel method for understanding and categorizing image-to-image generative models, potentially aiding in their evaluation and development.

Gemini 2.5 Flash Image
CelebA-HQ
GPT-image-1
COCO
Qwen Image Edit
Flux Kontext
SDXL img2img
SD3 img2img
DINOv2 ViT-B/14