New methods boost VLM robustness against adversarial attacks

By PulseAugur Editorial · [15 sources] · 2026-06-02 14:49

Researchers have developed new methods to improve the adversarial robustness of vision-language models (VLMs) like CLIP. SS-TPT uses stability and suitability scores to guide adaptation and inference, amplifying trustworthy views while suppressing corrupted ones. MAC employs multi-view counterattacks with corruption-aware soft weighting, adaptively scaling intensity based on estimated corruption. DBD leverages the observation that adversarial images shift in a dominant direction, using this 'Defense Direction' to recover robust representations and even surpass clean accuracy. AI

IMPACT These advancements in adversarial robustness are crucial for deploying vision-language models safely in real-world applications.

RANK_REASON Multiple research papers proposing novel methods for improving adversarial robustness in vision-language models.

Read on Hugging Face Daily Papers →

paper
safety

AI-generated summary · Google Gemini · from 15 sources. How we write summaries →

COVERAGE [15]

arXiv cs.AI TIER_1 English(EN) · Malikeh Ehghaghi, Bogl\'arka Ecsedi, Marsha Chechik, Colin Raffel · 2026-06-11 04:00

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

arXiv:2606.11409v1 Announce Type: cross Abstract: Adversarial robustness evaluations of large language models (LLMs) typically report attack success rate (ASR) under fixed query budgets, implicitly treating all attacks as equally costly. In practice, the computational expense of …
arXiv cs.AI TIER_1 English(EN) · Lijia Yu, Jiuxin Cao, Yuchen Qiang, Changhao Chen, Yifei Huang, Bo Liu · 2026-06-10 04:00

Improving Adversarial Transferability on Vision-Language Pre-training Models via Surrogate-Specific Bias Correction

arXiv:2606.10571v1 Announce Type: cross Abstract: Adversarial examples reveal vulnerabilities in Vision-Language Pre-training (VLP) models and provide insights for improving robustness. A key property is cross-model transferability, which enables transfer-based black-box attacks.…
arXiv cs.CL TIER_1 English(EN) · Eitan Cohen, Idan Simai, Uri Shaham · 2026-06-10 04:00

Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning

arXiv:2606.10610v1 Announce Type: new Abstract: Parameter-Efficient Fine-Tuning (PEFT) has become essential for adapting foundation models to downstream NLP tasks. However, current PEFT methods often struggle with robustness to noise and performance degradation on limited trainin…
arXiv cs.CL TIER_1 English(EN) · Uri Shaham · 2026-06-09 09:11

Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT) has become essential for adapting foundation models to downstream NLP tasks. However, current PEFT methods often struggle with robustness to noise and performance degradation on limited training data. We propose SDBN (Small Data Big Noise), …
arXiv cs.AI TIER_1 English(EN) · Bo Liu · 2026-06-09 08:34

Improving Adversarial Transferability on Vision-Language Pre-training Models via Surrogate-Specific Bias Correction

Adversarial examples reveal vulnerabilities in Vision-Language Pre-training (VLP) models and provide insights for improving robustness. A key property is cross-model transferability, which enables transfer-based black-box attacks. However, existing attacks often rely heavily on t…
arXiv cs.AI TIER_1 English(EN) · Hannah Gao (Massachusetts Institute of Technology), Isha Agarwal (Massachusetts Institute of Technology), Dylan Hadfield-Menell (Massachusetts Institute of Technology), Rachel Ma (Massachusetts Institute of Technology) · 2026-06-09 04:00

A Mechanistic Analysis of Adversarial Fine-tuning of Vision Transformers

arXiv:2606.07593v1 Announce Type: cross Abstract: The widespread use of image classification models in high-risk, real-world situations necessitates making these models robust to slight disturbances or perturbations, such as blurring or sharpening, in the input images. While visi…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-09 00:00

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

Compute-aware evaluation framework using FLOPs and risk-compute curves reveals non-monotonic effects of alignment training and varying attack costs across different harm categories.
arXiv cs.AI TIER_1 English(EN) · Sunoh Kim, Daeho Um · 2026-06-08 04:00

SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models

arXiv:2606.06943v1 Announce Type: cross Abstract: Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but remain highly fragile under adversarial perturbations. Recent test-time adaptation defenses improve robustness by leveraging many augmented views,…
arXiv cs.CV TIER_1 English(EN) · Sunoh Kim, Daeho Um · 2026-06-08 04:00

When CLIP Sees More, It Fights Back Harder: Multi-View Guided Adaptive Counterattacks for Test-Time Adversarial Robustness

arXiv:2606.06938v1 Announce Type: new Abstract: Vision-language models such as CLIP have achieved remarkable zero-shot recognition capabilities, yet their robustness against adversarial perturbations remains limited. Test-time counterattack (TTC) was recently proposed to improve …
arXiv cs.CV TIER_1 English(EN) · Daeho Um · 2026-06-05 06:12

SS-TPT: Stability and Suitability-Guided Test-Time Prompt Tuning for Adversarially Robust Vision-Language Models

Vision-language models (VLMs) such as CLIP achieve strong zero-shot recognition but remain highly fragile under adversarial perturbations. Recent test-time adaptation defenses improve robustness by leveraging many augmented views, but this leads to impractical slowdown and a clea…
arXiv cs.CV TIER_1 English(EN) · Daeho Um · 2026-06-05 06:04

When CLIP Sees More, It Fights Back Harder: Multi-View Guided Adaptive Counterattacks for Test-Time Adversarial Robustness

Vision-language models such as CLIP have achieved remarkable zero-shot recognition capabilities, yet their robustness against adversarial perturbations remains limited. Test-time counterattack (TTC) was recently proposed to improve CLIP's robustness by perturbing an input image t…
arXiv cs.CV TIER_1 English(EN) · Liangsheng Liu, Si Chen, Jiamin Wu, Weiwei Feng, Zhixin Cheng, Xiaotian Yin, Wenfei Yang, Tianzhu Zhang · 2026-06-05 04:00

Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models

arXiv:2606.06186v1 Announce Type: new Abstract: Vision-Language Models (VLMs), such as CLIP, have shown strong zero-shot generalization but remain highly vulnerable to adversarial perturbations, posing serious risks in real-world applications. Test-time defenses for VLMs have rec…
arXiv cs.CV TIER_1 English(EN) · Tianzhu Zhang · 2026-06-04 13:57

Adversarial Attacks Already Tell the Answer: Directional Bias-Guided Test-time Defense for Vision-Language Models

Vision-Language Models (VLMs), such as CLIP, have shown strong zero-shot generalization but remain highly vulnerable to adversarial perturbations, posing serious risks in real-world applications. Test-time defenses for VLMs have recently emerged as a promising and efficient appro…
arXiv cs.CV TIER_1 English(EN) · Hashmat Shadab Malik, Muzammal Naseer, Salman Khan · 2026-06-03 04:00

Beyond False Stability: High-Noise Drift Gating for Test-Time Adversarial Defenses in Vision-Language Models

arXiv:2606.03730v1 Announce Type: new Abstract: Vision-language models (VLMs) such as CLIP show strong zero-shot generalization but remain highly vulnerable to adversarial attacks. Adversarial training improves robustness but is computationally expensive, motivating test-time def…
arXiv cs.CV TIER_1 English(EN) · Salman Khan · 2026-06-02 14:49

Beyond False Stability: High-Noise Drift Gating for Test-Time Adversarial Defenses in Vision-Language Models

Vision-language models (VLMs) such as CLIP show strong zero-shot generalization but remain highly vulnerable to adversarial attacks. Adversarial training improves robustness but is computationally expensive, motivating test-time defenses. Recent approaches exploit how CLIP's visu…

COVERAGE [15]

RELATED ENTITIES

RELATED TOPICS