SIGMA: Semantic-Difference Instruction-Grounding Mask Annotator for Text-Driven Image Manipulation Localization
Researchers have developed SIGMA, a novel method for automatically generating pixel-level masks for image manipulation localization (IML) datasets. SIGMA addresses the challenge of low-cost data acquisition by leveraging existing image editing datasets, which contain millions of original and edited image pairs. The system uses semantic-feature differencing within a vision foundation backbone and incorporates instruction-derived spatial priors through cross-modal refinement to accurately identify manipulation regions, even accounting for unintended side effects. SIGMA has demonstrated superior performance compared to existing mask generators and, when applied to public editing corpora, has created a substantial training set that significantly improves the performance of various IML detectors. AI