Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models
Researchers have developed new methods to improve the adversarial robustness of vision-language models (VLMs) like CLIP. SS-TPT uses stability and suitability scores to guide adaptation and inference, amplifying trustworthy views while suppressing corrupted ones. MAC employs multi-view counterattacks with corruption-aware soft weighting, adaptively scaling intensity based on estimated corruption. DBD leverages the observation that adversarial images shift in a dominant direction, using this 'Defense Direction' to recover robust representations and even surpass clean accuracy. AI
IMPACT These advancements in adversarial robustness are crucial for deploying vision-language models safely in real-world applications.