Drive-JEPA framework advances end-to-end autonomous driving with novel video pretraining

By PulseAugur Editorial · [1 sources] · 2026-07-03 04:00

Researchers have introduced Drive-JEPA, a novel framework that combines Video Joint-Embedding Predictive Architecture (V-JEPA) with multimodal trajectory distillation for end-to-end autonomous driving. This approach adapts V-JEPA to pretrain a ViT encoder on extensive driving videos, generating predictive representations crucial for trajectory planning. The system also incorporates a proposal-centric planner that distills diverse simulator-generated and human trajectories, using a momentum-aware selection mechanism to ensure stable and safe driving behaviors. Evaluated on the NAVSIM benchmark, Drive-JEPA has achieved new state-of-the-art results. AI

IMPACT Introduces a new framework for end-to-end driving that sets new state-of-the-art benchmarks, potentially improving autonomous system planning and safety.

RANK_REASON The cluster contains an arXiv paper detailing a new research framework and model for autonomous driving. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CV →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

Drive-JEPA framework advances end-to-end autonomous driving with novel video pretraining

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Linhan Wang, Zichong Yang, Chen Bai, Guoxiang Zhang, Xiaotong Liu, Xiaoyin Zheng, Xiao-Xiao Long, Chang-Tien Lu, Cheng Lu · 2026-07-03 04:00

Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving

arXiv:2601.22032v2 Announce Type: replace Abstract: End-to-end autonomous driving increasingly leverages self-supervised video pretraining to learn transferable planning representations. However, pretraining video world models for scene understanding has so far brought only limit…

COVERAGE [1]

Drive-JEPA: Video JEPA Meets Multimodal Trajectory Distillation for End-to-End Driving

RELATED ENTITIES

RELATED TOPICS