Deep Residual Injection for Full-Spectrum Forensic Signal Perception in Multimodal Large Language Models
Researchers have developed a new method called Deep Visual Residual MLLM (Deep-VRM) to enhance the forensic capabilities of multimodal large language models (MLLMs). This approach preserves the models' pre-trained semantic understanding while injecting low-level artifact signals through a residual path. This allows the models to jointly process semantic reasoning and forensic cues, leading to robust and generalizable detection of AI-generated content. Experiments show that Deep-VRM achieves state-of-the-art performance on various benchmarks. AI
IMPACT Enhances MLLM capabilities for detecting AI-generated content by improving forensic signal perception.