Show, Don't Ask: Generative Visual Disambiguation for Composed Image Retrieval with Turn-Valid Coverage
Researchers have introduced CLARA, a novel framework designed to address ambiguity in composed image retrieval (CIR). Unlike previous methods that rely on text-based clarification, CLARA presents users with a small selection of visual alternatives. This approach allows users to directly select the image that best matches their intent, bypassing the need for the model to predict textual answers. CLARA maintains conformal guarantees across multiple interaction rounds by reweighting calibration based on user selections and ensuring displayed prototypes are grounded in real corpus images. AI
IMPACT This research could improve user experience and accuracy in image search applications by offering a more intuitive disambiguation process.