Preferences Order, Ratings Anchor: From Fused Expert Aesthetic Ground Truth to Self-Distillation
Researchers have developed a new benchmark called PPaint for image aesthetic assessment, which uses both pairwise preferences and pointwise ratings from experts. This dual-protocol approach revealed that preferences provide more consistent rankings, while ratings anchor the absolute score scale. By fusing these signals, they created a unified expert ground truth and extended the principle to training vision-language models (VLMs) without labels. A self-distillation method using this approach significantly improved an open-source VLM's aesthetic scoring capabilities, matching a closed-source model's performance with lower inference costs. AI
IMPACT Introduces a new benchmark and training method that significantly improves VLM aesthetic scoring, potentially impacting content generation and curation tools.