PulseAugur
实时 12:30:35

新型Transformer增强3D场景重建和边缘部署

研究人员开发了基于Transformer的新模型,用于从视觉输入进行3D场景重建。DVGT(Driving Visual Geometry Transformer)在无需显式几何先验的情况下,从无姿态的多视图图像中重建密集3D点图,并在多样化的驾驶数据集上进行训练。VG^2GT通过使用冻结的视觉基础模型和体素模块直接回归高斯原始参数来增强高斯溅射,从而降低了训练成本并优于现有方法。QVGGT通过引入量化框架,选择性地应用混合精度和令牌过滤,解决了大型Transformer模型的部署挑战,从而在边缘设备上实现高保真3D感知。 AI

影响 3D重建和模型压缩的进步使得在自动驾驶和边缘设备中能够实现更复杂的AI应用。

排序理由 多篇研究论文介绍了用于3D场景重建和优化技术的新型Transformer模型。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Sicheng Zuo, Zixun Xie, Wenzhao Zheng, Shaoqing Xu, Fang Li, Shengyin Jiang, Long Chen, Zhi-Xin Yang, Jiwen Lu ·

    DVGT: Driving Visual Geometry Transformer

    arXiv:2512.16919v2 Announce Type: replace-cross Abstract: Perceiving and reconstructing 3D scene geometry from visual inputs is crucial for autonomous driving. However, there still lacks a driving-targeted dense geometry perception model that can adapt to different scenarios and …

  2. arXiv cs.CV TIER_1 English(EN) · Yibin Zhao, Yihan Pan, Jun Nan, Wenli Yang, Liwei Chen, Jianjun Yi ·

    $\text{VG}^2$GT: Voxel-Gaussian Splatting Visual Geometry Grounded Transformer

    arXiv:2606.01573v1 Announce Type: new Abstract: Gaussian splatting has shown strong potential for 3D reconstruction and novel view synthesis. However, most existing methods require accurate camera parameters and per-scene optimization, while feed-forward methods with pixel-aligne…

  3. arXiv cs.CV TIER_1 English(EN) · Zhizhen Pan, Hesong Wang, Huan Wang ·

    QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer

    arXiv:2605.31124v1 Announce Type: new Abstract: Estimating 3D attributes directly from images has advanced rapidly with the Visual Geometry Grounded Transformer (VGGT), which predicts camera parameters, depth maps, and point clouds in a single forward pass. However, its 1.2B-para…

  4. arXiv cs.CV TIER_1 English(EN) · Huan Wang ·

    QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer

    Estimating 3D attributes directly from images has advanced rapidly with the Visual Geometry Grounded Transformer (VGGT), which predicts camera parameters, depth maps, and point clouds in a single forward pass. However, its 1.2B-parameter scale severely limits deployment on resour…