PulseAugur
LIVE 15:26:20
research · [3 sources] ·
0
research

New frameworks MCM-VG and DEGround advance zero-shot 3D visual grounding

Researchers have developed two new frameworks, DEGround and MCM-VG, to improve ego-centric 3D visual grounding, a key task for embodied intelligence. DEGround utilizes a homogeneous pipeline that shares object representations between detection and grounding, enhancing efficiency and performance. MCM-VG addresses challenges in zero-shot 3D visual grounding by establishing multiple consistent 2D-3D mappings to achieve precise localization and reduce spatial redundancy. Both methods demonstrate state-of-the-art results on various benchmarks, significantly outperforming previous approaches. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT Advances in 3D visual grounding could accelerate the development of more capable embodied AI agents and robots.

RANK_REASON Two new academic papers introduce novel frameworks for 3D visual grounding tasks.

Read on arXiv cs.CV →

COVERAGE [3]

  1. arXiv cs.CV TIER_1 · Yufei Yin, Jie Zheng, Qianke Meng, Zhou Yu, Minghao Chen, Jiajun Ding, Min Tan, Yuling Xi, Zhiwen Chen, Chengfei Lv ·

    Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding

    arXiv:2604.26261v1 Announce Type: new Abstract: Zero-shot 3D Visual Grounding (3DVG) is a critical capability for open-world embodied AI. However, existing methods are fundamentally bottlenecked by the poor quality of open-vocabulary 3D proposals, suffering from inaccurate catego…

  2. arXiv cs.CV TIER_1 · Yani Zhang, Dongming Wu, Hao Shi, Yingfei Liu, Tiancai Wang, Xingping Dong ·

    DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework

    arXiv:2506.05199v3 Announce Type: replace Abstract: A core task in embodied intelligence is ego-centric 3D visual grounding. Existing methods typically adopt two-stage, heterogeneous pipelines that pair a detector with a separate grounding model. Incompatible decoders and box hea…

  3. arXiv cs.CV TIER_1 · Chengfei Lv ·

    Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding

    Zero-shot 3D Visual Grounding (3DVG) is a critical capability for open-world embodied AI. However, existing methods are fundamentally bottlenecked by the poor quality of open-vocabulary 3D proposals, suffering from inaccurate categories and imprecise geometries, as well as the sp…