New methods boost VLM robustness against adversarial attacks
ByPulseAugur Editorial·[15 sources]·
Researchers have developed new methods to improve the adversarial robustness of vision-language models (VLMs) like CLIP. SS-TPT uses stability and suitability scores to guide adaptation and inference, amplifying trustworthy views while suppressing corrupted ones. MAC employs multi-view counterattacks with corruption-aware soft weighting, adaptively scaling intensity based on estimated corruption. DBD leverages the observation that adversarial images shift in a dominant direction, using this 'Defense Direction' to recover robust representations and even surpass clean accuracy.
AI
IMPACT
These advancements in adversarial robustness are crucial for deploying vision-language models safely in real-world applications.
RANK_REASON
Multiple research papers proposing novel methods for improving adversarial robustness in vision-language models.
arXiv:2606.11409v1 Announce Type: cross Abstract: Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly treating all attacks as equally costly. In practice, the computational expense of …
arXiv:2606.10571v1 Announce Type: cross Abstract: Adversarial examples reveal vulnerabilities in Vision-Language Pre-training (VLP) models and provide insights for improving robustness. A key property is cross-model transferability, which enables transfer-based black-box attacks.…
arXiv cs.CL
TIER_1English(EN)·Eitan Cohen, Idan Simai, Uri Shaham·
arXiv:2606.10610v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT) has become essential for adapting foundation models to downstream NLP tasks. However, current PEFT methods often struggle with robustness to noise and performance degradation on limited trainin…
Parameter-Efficient Fine-Tuning (PEFT) has become essential for adapting foundation models to downstream NLP tasks. However, current PEFT methods often struggle with robustness to noise and performance degradation on limited training data. We propose SDBN (Small Data Big Noise), …
Adversarial examples reveal vulnerabilities in Vision-Language Pre-training (VLP) models and provide insights for improving robustness. A key property is cross-model transferability, which enables transfer-based black-box attacks. However, existing attacks often rely heavily on t…
arXiv cs.AI
TIER_1English(EN)·Hannah Gao (Massachusetts Institute of Technology), Isha Agarwal (Massachusetts Institute of Technology), Dylan Hadfield-Menell (Massachusetts Institute of Technology), Rachel Ma (Massachusetts Institute of Technology)·
arXiv:2606.07593v1 Announce Type: cross Abstract: The widespread use of image classification models in high-risk, real-world situations necessitates making these models robust to slight disturbances or perturbations, such as blurring or sharpening, in the input images. While visi…
Compute-aware evaluation framework using FLOPs and risk-compute curves reveals non-monotonic effects of alignment training and varying attack costs across different harm categories.
arXiv cs.AI
TIER_1English(EN)·Sunoh Kim, Daeho Um·
arXiv:2606.06938v1 Announce Type: new Abstract: Vision-language models such as CLIP have achieved remarkable zero-shot recognition capabilities, yet their robustness against adversarial perturbations remains limited. Test-time counterattack (TTC) was recently proposed to improve …
Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but remain highly fragile under adversarial perturbations. Recent test-time adaptation defenses improve robustness by leveraging many augmented views, but this leads to impractical slowdown and a clea…
Vision-language models such as CLIP have achieved remarkable zero-shot recognition capabilities, yet their robustness against adversarial perturbations remains limited. Test-time counterattack (TTC) was recently proposed to improve CLIP's robustness by perturbing an input image t…
arXiv:2606.06186v1 Announce Type: new Abstract: Vision-Language Models (VLMs), such as CLIP, have shown strong zero-shot generalization but remain highly vulnerable to adversarial perturbations, posing serious risks in real-world applications. Test-time defenses for VLMs have rec…
Vision-Language Models (VLMs), such as CLIP, have shown strong zero-shot generalization but remain highly vulnerable to adversarial perturbations, posing serious risks in real-world applications. Test-time defenses for VLMs have recently emerged as a promising and efficient appro…
arXiv:2606.03730v1 Announce Type: new Abstract: Vision-language models (VLMs) such as CLIP show strong zero-shot generalization but remain highly vulnerable to adversarial attacks. Adversarial training improves robustness but is computationally expensive, motivating test-time def…
Vision-language models (VLMs) such as CLIP show strong zero-shot generalization but remain highly vulnerable to adversarial attacks. Adversarial training improves robustness but is computationally expensive, motivating test-time defenses. Recent approaches exploit how CLIP's visu…