PulseAugur
EN
LIVE 11:50:00

New Transformers Enhance 3D Scene Reconstruction and Edge Deployment

Researchers have developed new transformer-based models for 3D scene reconstruction from visual inputs. DVGT, a Driving Visual Geometry Transformer, reconstructs dense 3D point maps from unposed multi-view images without explicit geometric priors, trained on diverse driving datasets. VG^2GT enhances Gaussian splatting by using frozen visual foundation models and a voxel module to directly regress Gaussian primitive parameters, reducing training costs and outperforming existing methods. QVGGT addresses the deployment challenges of large transformer models by introducing a quantization framework that selectively applies mixed precision and token filtering, enabling high-fidelity 3D perception on edge devices. AI

IMPACT Advances in 3D reconstruction and model compression enable more sophisticated AI applications in autonomous driving and edge devices.

RANK_REASON Multiple research papers introducing novel transformer-based models for 3D scene reconstruction and optimization techniques.

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 4 sources. How we write summaries →

COVERAGE [4]

  1. arXiv cs.AI TIER_1 English(EN) · Sicheng Zuo, Zixun Xie, Wenzhao Zheng, Shaoqing Xu, Fang Li, Shengyin Jiang, Long Chen, Zhi-Xin Yang, Jiwen Lu ·

    DVGT: Driving Visual Geometry Transformer

    arXiv:2512.16919v2 Announce Type: replace-cross Abstract: Perceiving and reconstructing 3D scene geometry from visual inputs is crucial for autonomous driving. However, there still lacks a driving-targeted dense geometry perception model that can adapt to different scenarios and …

  2. arXiv cs.CV TIER_1 English(EN) · Yibin Zhao, Yihan Pan, Jun Nan, Wenli Yang, Liwei Chen, Jianjun Yi ·

    $\text{VG}^2$GT: Voxel-Gaussian Splatting Visual Geometry Grounded Transformer

    arXiv:2606.01573v1 Announce Type: new Abstract: Gaussian splatting has shown strong potential for 3D reconstruction and novel view synthesis. However, most existing methods require accurate camera parameters and per-scene optimization, while feed-forward methods with pixel-aligne…

  3. arXiv cs.CV TIER_1 English(EN) · Zhizhen Pan, Hesong Wang, Huan Wang ·

    QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer

    arXiv:2605.31124v1 Announce Type: new Abstract: Estimating 3D attributes directly from images has advanced rapidly with the Visual Geometry Grounded Transformer (VGGT), which predicts camera parameters, depth maps, and point clouds in a single forward pass. However, its 1.2B-para…

  4. arXiv cs.CV TIER_1 English(EN) · Huan Wang ·

    QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer

    Estimating 3D attributes directly from images has advanced rapidly with the Visual Geometry Grounded Transformer (VGGT), which predicts camera parameters, depth maps, and point clouds in a single forward pass. However, its 1.2B-parameter scale severely limits deployment on resour…