Researchers have re-examined preference alignment for image inpainting, utilizing the Direct Preference Optimization framework with publicly available reward models. Their study revealed that while most reward models offer valid signals, some exhibit biases in brightness, composition, and color, leading to reward hacking. An ensemble of these reward models effectively mitigates these biases, resulting in improved performance on standard metrics and human assessments, even showing transferability to object removal tasks. AI
IMPACT Identifies biases in current reward models for image generation tasks, suggesting ensemble methods for more robust and generalizable results.
RANK_REASON This is a research paper detailing a study on image inpainting techniques and reward models. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →