Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 6d · [4 sources]

DVGT: Driving Visual Geometry Transformer

Researchers have developed new transformer-based models for 3D scene reconstruction from visual inputs. DVGT, a Driving Visual Geometry Transformer, reconstructs dense 3D point maps from unposed multi-view images without explicit geometric priors, trained on diverse driving datasets. VG^2GT enhances Gaussian splatting by using frozen visual foundation models and a voxel module to directly regress Gaussian primitive parameters, reducing training costs and outperforming existing methods. QVGGT addresses the deployment challenges of large transformer models by introducing a quantization framework that selectively applies mixed precision and token filtering, enabling high-fidelity 3D perception on edge devices. AI

IMPACT Advances in 3D reconstruction and model compression enable more sophisticated AI applications in autonomous driving and edge devices.

VGGT
QVGGT
Gaussian splatting
ScanNet
Replica
nuScenes
Visual Geometry Transformer
DVGT
Visual Foundation Model
Waymo
VG^2GT
KITTI