DVGT: Driving Visual Geometry Transformer
Researchers have developed new transformer-based models for 3D scene reconstruction from visual inputs. DVGT, a Driving Visual Geometry Transformer, reconstructs dense 3D point maps from unposed multi-view images without explicit geometric priors, trained on diverse driving datasets. VG^2GT enhances Gaussian splatting by using frozen visual foundation models and a voxel module to directly regress Gaussian primitive parameters, reducing training costs and outperforming existing methods. QVGGT addresses the deployment challenges of large transformer models by introducing a quantization framework that selectively applies mixed precision and token filtering, enabling high-fidelity 3D perception on edge devices. AI
IMPACT Advances in 3D reconstruction and model compression enable more sophisticated AI applications in autonomous driving and edge devices.