Flickr30K
PulseAugur coverage of Flickr30K — every cluster mentioning Flickr30K across labs, papers, and developer communities, ranked by signal.
2 day(s) with sentiment data
-
New LARE framework enhances text-image retrieval by encoding low-attention regions
Researchers have introduced LARE (Low-Attention Region Encoding), a novel framework designed to improve text-image retrieval, particularly in complex scenes with many objects. LARE employs a dual-encoding strategy that …
-
New framework rectifies noisy cross-modal data using graph reasoning
Researchers have developed a new framework called Intra-modal Neighbor-aware Noise Rectification (IN2R) to improve the accuracy of cross-modal retrieval by addressing noise in large web-harvested datasets. Unlike previo…
-
New FAST-GOAL method enhances vision-language models for detailed text
Researchers have developed FAST-GOAL, an efficient fine-tuning method designed to improve the ability of vision-language models like CLIP to process lengthy and detailed text descriptions. The method employs two main co…
-
New VAGS method enhances AI image editing and generation quality
Researchers have introduced Velocity Adaptive Guidance Scale (VAGS), a novel method for improving image editing and generation quality. VAGS dynamically adjusts the guidance scale during the diffusion process, unlike tr…
-
EASE framework enables federated multimodal unlearning by addressing entanglement
Researchers have developed EASE, a new framework for federated multimodal unlearning that addresses the challenge of entangled knowledge across different data modalities and client updates. The method identifies three k…
-
Researchers find single hub text exploits vulnerabilities in CLIP cross-modal encoders
Researchers have identified a vulnerability in cross-modal encoders like CLIP, which map text and images into a shared embedding space. They discovered that a single "hub text" can generate high similarity scores with n…
-
New framework enhances federated cross-modal retrieval with missing modalities
Researchers have developed RCSR, a new framework designed to improve federated cross-modal retrieval, particularly when dealing with data heterogeneity and missing modalities across clients. The system utilizes a frozen…