Never Seen Before: Benchmarking Genuine Zero-Shot Composed Image Retrieval with Consistent Video-Sourced Datasets
Researchers have introduced ZeroSight, a new benchmark designed to evaluate Zero-Shot Composed Image Retrieval (ZS-CIR) more accurately. Existing benchmarks often use data that models have already been trained on, leading to inflated performance metrics. ZeroSight utilizes video-sourced datasets and LLM-assisted captioning to create consistent reference-target pairs, ensuring a true zero-shot scenario. The researchers also propose SC4CIR, a method to identify difficult negative targets and improve retrieval performance. AI
IMPACT Establishes a more rigorous evaluation standard for zero-shot image retrieval, potentially guiding future model development.