GeoSAM-3D enables 3D scene segmentation from monocular video

By PulseAugur Editorial · [1 sources] · 2026-06-02 04:00

Researchers have introduced GeoSAM-3D, a novel method for segmenting objects in 3D scenes using only monocular video. This approach allows users to upload a short video, select an object in a single frame, and receive a propagated 3D mask. GeoSAM-3D achieves this by combining pre-trained image and video models with 3D Gaussian Splatting reconstruction and a unique graph-geodesic propagation kernel. AI

IMPACT Enables detailed 3D scene understanding from readily available monocular video, potentially impacting robotics and AR/VR.

RANK_REASON The cluster contains a research paper detailing a new method for 3D scene segmentation. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Arun Sharma · 2026-06-02 04:00

GeoSAM-3D: Geodesic Prompt Propagation for Open-Vocabulary 3D Scene Segmentation from Monocular Video

arXiv:2606.00447v1 Announce Type: cross Abstract: Open-vocabulary 3D scene segmentation usually assumes RGB-D video, calibrated multi-view imagery, or a reconstructed mesh. GeoSAM-3D studies a lighter setting: a user uploads a short monocular video, clicks or names an object in o…

COVERAGE [1]

GeoSAM-3D: Geodesic Prompt Propagation for Open-Vocabulary 3D Scene Segmentation from Monocular Video

RELATED ENTITIES

RELATED TOPICS