Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 10h

CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences

Researchers have introduced CV-Arena, a new benchmark designed to evaluate instruction-guided image editing capabilities. This benchmark features 12,000 real-image instruction pairs across 16 task types, aiming to capture professional workflows beyond simple appearance edits. It also proposes Active Elo, a human-AI collaborative preference protocol for scalable evaluation, and demonstrates the potential of agentic models like CV-Agent for improved instruction following in visual editing. AI

IMPACT Establishes a new standard for evaluating complex image editing tasks, potentially driving advancements in multimodal AI capabilities.

CV-Agent
CV-Arena
CV-Judge
CogRetriever