VLMs predict pedestrian intent from egocentric video

By PulseAugur Editorial · [3 sources] · 2026-06-08 07:39

Researchers have developed a new method for predicting pedestrian crossing intentions using egocentric vision and vision-language models (VLMs). By framing the task as visual question answering, they fine-tuned VLMs to significantly outperform existing transformer-based models. The inclusion of contextual cues like eye gaze and ego motion further enhanced prediction accuracy, establishing a new state-of-the-art for this safety-critical application. AI

IMPACT Establishes a new state-of-the-art for pedestrian intent prediction, potentially improving autonomous driving safety systems.

RANK_REASON The cluster contains an academic paper detailing a new research methodology and benchmark results.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 3 sources. How we write summaries →

COVERAGE [3]

arXiv cs.AI TIER_1 English(EN) · Danya Li, Xiang Su, Yan Feng, Rico Krueger · 2026-06-09 04:00

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

arXiv:2606.09142v1 Announce Type: cross Abstract: Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions fro…
Hugging Face Daily Papers TIER_1 English(EN) · 2026-06-08 07:39

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions from short egocentric video clips. We approach this b…
arXiv cs.CV TIER_1 English(EN) · Rico Krueger · 2026-06-08 07:39

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions from short egocentric video clips. We approach this b…

COVERAGE [3]

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models

RELATED ENTITIES

RELATED TOPICS