FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning
Researchers have developed FashionLens, a unified framework for versatile fashion image retrieval using Multimodal Large Language Models. This system addresses the limitations of existing approaches by supporting diverse query formats and search intentions. To achieve this, FashionLens incorporates a Proposal-Guided Spherical Query Calibrator for task-aligned metric spaces and a Gradient-Guided Adaptive Sampling strategy to balance optimization across varying task complexities. The framework demonstrates state-of-the-art performance on the new U-FIRE benchmark, which consolidates fragmented fashion datasets. AI
IMPACT This framework could significantly improve e-commerce search by enabling more nuanced and diverse fashion image retrieval.