GeoSAM-3D: Geodesic Prompt Propagation for Open-Vocabulary 3D Scene Segmentation from Monocular Video
Researchers have introduced GeoSAM-3D, a novel method for segmenting objects in 3D scenes using only monocular video. This approach allows users to upload a short video, select an object in a single frame, and receive a propagated 3D mask. GeoSAM-3D achieves this by combining pre-trained image and video models with 3D Gaussian Splatting reconstruction and a unique graph-geodesic propagation kernel. AI
IMPACT Enables detailed 3D scene understanding from readily available monocular video, potentially impacting robotics and AR/VR.