Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 10h

Through the PRISM: Principle-Aware, Interpretable, and Multi-Scale Evaluation of Visual Designs

Researchers have developed PRISM, a new benchmark designed to evaluate visual design quality by assessing how well AI models understand and adhere to specific design principles like readability and contrast. The benchmark includes 110,000 perturbed designs to test model sensitivity to principle violations. Initial tests showed that models like Qwen-2.5-VL and GPT-4o-mini struggled with targeted degradations, while GPT-4o demonstrated broader awareness without fine-grained understanding. The team also proposed a framework for interpretable design assessment using multimodal models to provide localized feedback and enable targeted refinements. AI

IMPACT Establishes a new evaluation standard for multimodal models, pushing for more interpretable and principle-aware AI in design applications.

GPT-4o
GPT-4o-mini
PRISM
Qwen-2.5-VL
Crello dataset