Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 17h

Taming I2V models for Image HOI Editing: A Cognitive Benchmark and Agentic Self-Correcting Framework

Researchers have introduced HOI-Edit, a new benchmark designed to evaluate image editing capabilities specifically for Human-Object Interactions (HOI). This benchmark features three cognitive levels and an automated metric called HOI-Eval, which assesses instance-level interactions through a vision-language model's question-answering process. The study also proposes SCPE, a self-correcting framework utilizing Image-to-Video (I2V) models to improve the accuracy of dynamic HOI editing by refining prompts iteratively. AI

IMPACT This research introduces a specialized benchmark and framework for improving image editing capabilities related to human-object interactions, potentially advancing the realism and complexity of AI-generated visual content.

Nano Banana
HOI-Edit
HOI-Eval
Image-to-Video (I2V) models