Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 12h

BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM

Researchers have developed BusterX++, a novel multimodal large language model (MLLM) designed for unified detection and explanation of AI-generated content across images and videos. This approach aims to address the growing issue of visual misinformation by leveraging cross-modal synergies. A new benchmark, GenBuster-Bench++, was also introduced to facilitate research in this area. Notably, the study found that a single-stage reinforcement learning strategy, driven by sparse rewards, can match or even surpass traditional supervised fine-tuning followed by reinforcement learning, suggesting that pure RL's higher policy entropy aids in developing cross-modal capabilities. AI

IMPACT This research could lead to more robust tools for combating AI-generated misinformation across different media types.

reinforcement learning
multimodal large language model
supervised fine-tuning
BusterX++
GenBuster-Bench++
Haiquan Wen