Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 4d

Enhancing Multimodal Large Language Models for Safety-Critical Driving Video Analysis

Researchers have developed a new pipeline to improve the ability of multimodal large language models (MLLMs) to analyze safety-critical driving events. This pipeline fuses downsampled video frames with telematics data and insights from specialized computer vision models to create high-quality training data. By fine-tuning the open-source QwenVL-2.5 model using this data, they achieved significant improvements in identifying and explaining safety-critical events with a limited computational budget. AI

IMPACT Enhances AI's ability to analyze complex, safety-critical visual data, potentially improving autonomous driving systems.

MLLMs
DoRA adapters
QwenVL-2.5