ROGLE: Robust Global-Local Alignment with Automated Region Supervision for Text-Based Person Search
Researchers have developed ROGLE, a new framework designed to improve text-based person search by addressing limitations in fine-grained understanding and the scarcity of region-level annotations. The system utilizes an automated Region-to-Sentence Matching strategy to generate pseudo region-sentence pairs for supervision, reducing the need for manual annotation. ROGLE also integrates global contrastive learning with local alignment and introduces the P-VLG Benchmark, a large dataset with over 100,000 annotated regions and long-form captions to support both global and local assessments. AI
IMPACT Introduces a novel approach to improve fine-grained understanding in text-based person search, potentially benefiting surveillance and security applications.