BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM
Researchers have developed BusterX++, a novel multimodal large language model (MLLM) designed for unified detection and explanation of AI-generated content across images and videos. This approach aims to address the growing issue of visual misinformation by leveraging cross-modal synergies. A new benchmark, GenBuster-Bench++, was also introduced to facilitate research in this area. Notably, the study found that a single-stage reinforcement learning strategy, driven by sparse rewards, can match or even surpass traditional supervised fine-tuning followed by reinforcement learning, suggesting that pure RL's higher policy entropy aids in developing cross-modal capabilities. AI
IMPACT This research could lead to more robust tools for combating AI-generated misinformation across different media types.