RMPL: Relation-aware Multi-task Progressive Learning with Stage-wise Training for Multimedia Event Extraction
Researchers have developed a new framework called RMPL (Relation-aware Multi-task Progressive Learning) to improve multimedia event extraction, which involves identifying events and their arguments from text and images. This method addresses the scarcity of annotated training data by using stage-wise training with heterogeneous supervision from unimodal event extraction and multimedia relation extraction. Experiments on the M2E2 benchmark demonstrated that RMPL consistently enhances performance across various modality settings when used with multiple Vision-Language Models (VLMs). AI
IMPACT Introduces a novel approach to improve event extraction in multimodal data, potentially enhancing AI systems that process both text and images.