Decoupling Semantics from Distortions: Multi-Scale Two-Stream Vision-Language Alignment for AI-Generated Image Quality Assessment
Researchers have introduced MST-CLIPIQA, a novel multi-scale two-stream framework designed to improve AI-generated image quality assessment. This method decouples semantic understanding from perceptual sensitivity, using dual CLIP encoders with different patch granularities to capture both global coherence and fine-grained artifact patterns. An adaptive fusion mechanism then distills this information, leading to state-of-the-art results on five benchmarks for both image quality and text-image correspondence. AI
IMPACT Establishes new state-of-the-art in AI-generated image quality assessment, potentially improving the evaluation of generative models.