New frameworks MCM-VG and DEGround advance zero-shot 3D visual grounding

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 3 sources

Researchers have developed two new frameworks, DEGround and MCM-VG, to improve ego-centric 3D visual grounding, a key task for embodied intelligence. DEGround utilizes a homogeneous pipeline that shares object representations between detection and grounding, enhancing efficiency and performance. MCM-VG addresses challenges in zero-shot 3D visual grounding by establishing multiple consistent 2D-3D mappings to achieve precise localization and reduce spatial redundancy. Both methods demonstrate state-of-the-art results on various benchmarks, significantly outperforming previous approaches. AI

Summary written by gemini-2.5-flash-lite from 3 sources. How we write summaries →

IMPACT Advances in 3D visual grounding could accelerate the development of more capable embodied AI agents and robots.

RANK_REASON Two new academic papers introduce novel frameworks for 3D visual grounding tasks.

Read on arXiv cs.CV →

paper
other

COVERAGE [3]

arXiv cs.CV TIER_1 · Yufei Yin, Jie Zheng, Qianke Meng, Zhou Yu, Minghao Chen, Jiajun Ding, Min Tan, Yuling Xi, Zhiwen Chen, Chengfei Lv · 2026-04-30 04:00

Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding

arXiv:2604.26261v1 Announce Type: new Abstract: Zero-shot 3D Visual Grounding (3DVG) is a critical capability for open-world embodied AI. However, existing methods are fundamentally bottlenecked by the poor quality of open-vocabulary 3D proposals, suffering from inaccurate catego…
arXiv cs.CV TIER_1 · Yani Zhang, Dongming Wu, Hao Shi, Yingfei Liu, Tiancai Wang, Xingping Dong · 2026-04-29 04:00

DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework

arXiv:2506.05199v3 Announce Type: replace Abstract: A core task in embodied intelligence is ego-centric 3D visual grounding. Existing methods typically adopt two-stage, heterogeneous pipelines that pair a detector with a separate grounding model. Incompatible decoders and box hea…
arXiv cs.CV TIER_1 · Chengfei Lv · 2026-04-29 03:38

Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding

Zero-shot 3D Visual Grounding (3DVG) is a critical capability for open-world embodied AI. However, existing methods are fundamentally bottlenecked by the poor quality of open-vocabulary 3D proposals, suffering from inaccurate categories and imprecise geometries, as well as the sp…

COVERAGE [3]

Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding

DEGround: An Effective Baseline for Ego-centric 3D Visual Grounding with a Homogeneous Framework

Multiple Consistent 2D-3D Mappings for Robust Zero-Shot 3D Visual Grounding

RELATED ENTITIES

RELATED TOPICS