Vision-language Models for Driver Monitoring Systems: A Driver Activity Description Dataset
Researchers are exploring the use of vision-language models (VLMs) to better understand driver behavior and attention. One study adapted a VLM with a new dataset of fine-grained driver activity descriptions, showing improved accuracy in interpreting actions. Another paper investigated how minimal human supervision can guide VLMs to generate interpretable descriptions of driver attention shifts, complementing traditional gaze heatmaps. AI
IMPACT Advances in VLM fine-tuning and dataset creation could lead to more sophisticated driver assistance and safety systems.