Composed Object Retrieval: Object-level Retrieval via Composed Expressions
Researchers have introduced Composed Object Retrieval (COR), a novel task designed to enable object-level retrieval within images using composed expressions. Unlike existing Composed Image Retrieval (CIR) methods that match entire images, COR focuses on localizing specific objects and grounding them with pixel-level masks. This new task requires models to perform complex visual-textual reasoning to identify desired modifications to reference objects, even when faced with visually similar distractors. To support this task, a new benchmark called COR125K has been created, featuring over 125,000 retrieval triplets across numerous categories. The proposed CORE model demonstrates significant improvements over current CIR pipelines and baselines, establishing a new foundation for fine-grained object-level multimodal retrieval. AI
IMPACT This research could lead to more precise and nuanced image search capabilities, improving applications that require fine-grained visual content understanding.