PulseAugur
EN
LIVE 03:52:52

VLMs predict pedestrian intent from egocentric video

Researchers have developed a new method for predicting pedestrian crossing intentions using egocentric vision and vision-language models (VLMs). By framing the task as visual question answering, they fine-tuned VLMs to significantly outperform existing transformer-based models. The inclusion of contextual cues like eye gaze and ego motion further enhanced prediction accuracy, establishing a new state-of-the-art for this safety-critical application. AI

IMPACT Establishes a new state-of-the-art for pedestrian intent prediction, potentially improving autonomous driving safety systems.

RANK_REASON The cluster contains an academic paper detailing a new research methodology and benchmark results.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

COVERAGE [3]

  1. arXiv cs.AI TIER_1 English(EN) · Danya Li, Xiang Su, Yan Feng, Rico Krueger ·

    Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

    arXiv:2606.09142v1 Announce Type: cross Abstract: Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions fro…

  2. Hugging Face Daily Papers TIER_1 English(EN) ·

    Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

    Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions from short egocentric video clips. We approach this b…

  3. arXiv cs.CV TIER_1 English(EN) · Rico Krueger ·

    Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

    Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions from short egocentric video clips. We approach this b…