Robust-U1: Can MLLMs Self-Recover Corrupted Visual Content for Robust Understanding?
Researchers have developed Robust-U1, a new framework designed to enhance the robustness of Multimodal Large Language Models (MLLMs) when dealing with corrupted visual content. This approach enables MLLMs to self-recover damaged images, improving their ability to understand and reason about visual information. The framework utilizes a three-stage process involving supervised fine-tuning, reinforcement learning with dual rewards, and multimodal reasoning to achieve state-of-the-art performance on corruption benchmarks. AI
IMPACT Enhances MLLM robustness against visual corruption, potentially improving real-world application reliability.