Researchers have developed a novel dual-branch gaze prediction framework to improve interpretable driver attention prediction for autonomous driving. This framework addresses limitations in existing datasets by constructing a new object-level driver attention dataset called G-W3DA, which uses a multimodal large language model and Segment Anything Model 3 (SAM3) to decouple gaze into object-level masks. The proposed DualGaze-VLM architecture then leverages this data to achieve intent-driven spatial anchoring, outperforming current state-of-the-art models in spatial alignment metrics and generating attention heatmaps perceived as authentic by human evaluators. AI
Summary written by None from 1 source. How we write summaries →
IMPACT Enhances interpretability in autonomous driving systems by providing more precise object-level attention prediction.
RANK_REASON This is a research paper introducing a new dataset and model architecture for gaze prediction.