Taming I2V models for Image HOI Editing: A Cognitive Benchmark and Agentic Self-Correcting Framework
Researchers have introduced HOI-Edit, a new benchmark designed to evaluate image editing capabilities specifically for Human-Object Interactions (HOI). This benchmark features three cognitive levels and an automated metric called HOI-Eval, which assesses instance-level interactions through a vision-language model's question-answering process. The study also proposes SCPE, a self-correcting framework utilizing Image-to-Video (I2V) models to improve the accuracy of dynamic HOI editing by refining prompts iteratively. AI
IMPACT This research introduces a specialized benchmark and framework for improving image editing capabilities related to human-object interactions, potentially advancing the realism and complexity of AI-generated visual content.