PulseAugur
实时 13:14:44

New frameworks MCM-VG and DEGround advance zero-shot 3D visual grounding

Researchers have developed two new frameworks, DEGround and MCM-VG, to improve ego-centric 3D visual grounding, a key task for embodied intelligence. DEGround utilizes a homogeneous pipeline that shares object representations between detection and grounding, enhancing efficiency and performance. MCM-VG addresses challenges in zero-shot 3D visual grounding by establishing multiple consistent 2D-3D mappings to achieve precise localization and reduce spatial redundancy. Both methods demonstrate state-of-the-art results on various benchmarks, significantly outperforming previous approaches. AI

影响 Advances in 3D visual grounding could accelerate the development of more capable embodied AI agents and robots.

排序理由 Two new academic papers introduce novel frameworks for 3D visual grounding tasks.

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

New frameworks MCM-VG and DEGround advance zero-shot 3D visual grounding

报道来源 [3]

  1. arXiv cs.CV TIER_1 English(EN) · Yufei Yin, Jie Zheng, Qianke Meng, Zhou Yu, Minghao Chen, Jiajun Ding, Min Tan, Yuling Xi, Zhiwen Chen, Chengfei Lv ·

    Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding

    arXiv:2604.26261v1 Announce Type: new Abstract: Zero-shot 3D Visual Grounding (3DVG) is a critical capability for open-world embodied AI. However, existing methods are fundamentally bottlenecked by the poor quality of open-vocabulary 3D proposals, suffering from inaccurate catego…

  2. arXiv cs.CV TIER_1 English(EN) · Yani Zhang, Dongming Wu, Hao Shi, Yingfei Liu, Tiancai Wang, Xingping Dong ·

    DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework

    arXiv:2506.05199v3 Announce Type: replace Abstract: A core task in embodied intelligence is ego-centric 3D visual grounding. Existing methods typically adopt two-stage, heterogeneous pipelines that pair a detector with a separate grounding model. Incompatible decoders and box hea…

  3. arXiv cs.CV TIER_1 English(EN) · Chengfei Lv ·

    Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding

    Zero-shot 3D Visual Grounding (3DVG) is a critical capability for open-world embodied AI. However, existing methods are fundamentally bottlenecked by the poor quality of open-vocabulary 3D proposals, suffering from inaccurate categories and imprecise geometries, as well as the sp…