New benchmark tests VLM robustness to physical visual stress

By PulseAugur Editorial · [2 sources] · 2026-05-30 00:00

Researchers have introduced RoboStressBench, a new benchmark designed to evaluate the robustness of vision-language models (VLMs) in embodied AI systems. This benchmark decomposes visual stress into four key physical dimensions: material, viewpoint, lighting, and geometry. By assessing VLMs under these varied conditions, RoboStressBench aims to identify specific failure modes and improve the reliability of AI perception in real-world scenarios. AI

IMPACT Provides a framework for assessing and improving VLM reliability in physical environments, crucial for embodied AI applications.

RANK_REASON The cluster describes a new academic paper introducing a benchmark for evaluating AI models.

Read on Hugging Face Daily Papers →

paper
safety

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

Hugging Face Daily Papers TIER_1 English(EN) · 2026-05-30 00:00

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

RoboStressBench presents a principled benchmark for evaluating vision-language model robustness to physical visual stress in embodied AI, decomposing visual stress into material, viewpoint, lighting, and geometry dimensions.
arXiv cs.CV TIER_1 English(EN) · Leyi Wu, Yifan Zhao, Jinjie Zhang, Suzeyu Chen, Wosong Chen, Zhifei Chen, Tianshuo Xu, Qingchun He, Hongxin Hu, Haojian Huang, Yangkai Wei, Wenqian Li, Yinchuan Li, Ying-Cong Chen · 2026-06-02 04:00

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

arXiv:2606.00828v1 Announce Type: new Abstract: Vision-Language Models (VLMs) have shown strong visual understanding and are increasingly deployed in embodied AI systems, where reliable perception under real conditions is essential. However, existing benchmarks assess VLMs using …

COVERAGE [2]

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

RoboStressBench: Benchmarking VLM Robustness to Physical Visual Stress in Embodied Scenes

RELATED ENTITIES

RELATED TOPICS