Mscoco
PulseAugur coverage of Mscoco — every cluster mentioning Mscoco across labs, papers, and developer communities, ranked by signal.
-
New FAST-GOAL method enhances vision-language models for detailed text
Researchers have developed FAST-GOAL, an efficient fine-tuning method designed to improve the ability of vision-language models like CLIP to process lengthy and detailed text descriptions. The method employs two main co…
-
Google DeepMind unveils Gemini Embedding 2 multimodal model
Google DeepMind has introduced Gemini Embedding 2, a new native multimodal embedding model. This model can generate unified representations for video, audio, image, and text data, demonstrating strong zero-shot capabili…
-
Researchers unveil new stealthy backdoor attacks on AI models using diffusion and style features
Researchers have developed new methods for backdoor attacks on advanced AI models, specifically targeting Vision-Language Models (VLMs) and Diffusion Models (DMs). One approach, CBV, uses diffusion models to create natu…
-
Researchers find single hub text exploits vulnerabilities in CLIP cross-modal encoders
Researchers have identified a vulnerability in cross-modal encoders like CLIP, which map text and images into a shared embedding space. They discovered that a single "hub text" can generate high similarity scores with n…