Researchers have introduced Composed Object Retrieval (COR), a novel task designed to enable object-level retrieval within images using composed expressions. Unlike existing Composed Image Retrieval (CIR) methods that match entire images, COR focuses on localizing specific objects and grounding them with pixel-level masks. This new task requires models to perform complex visual-textual reasoning to identify desired modifications to reference objects, even when faced with visually similar distractors. To support this task, a new benchmark called COR125K has been created, featuring over 125,000 retrieval triplets across numerous categories. The proposed CORE model demonstrates significant improvements over current CIR pipelines and baselines, establishing a new foundation for fine-grained object-level multimodal retrieval. AI
IMPACT This research could lead to more precise and nuanced image search capabilities, improving applications that require fine-grained visual content understanding.
RANK_REASON The cluster describes a new research paper introducing a novel task and benchmark for object-level image retrieval. [lever_c_demoted from research: ic=1 ai=1.0]
- alphaXiv
- arXiv
- CatalyzeX
- Composed Image Retrieval
- Composed Object Retrieval
- COR125K
- DagsHub
- Hugging Face
- Tong Wang
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →