RADIO-ViPE enables open-vocabulary semantic SLAM from monocular video

By PulseAugur Editorial · [1 sources] · 2026-04-30 04:00

Researchers have developed RADIO-ViPE, a novel semantic SLAM system capable of open-vocabulary grounding in dynamic environments using only monocular RGB video. This system integrates multi-modal embeddings from foundation models with geometric scene information, eliminating the need for depth sensors or pose initialization. RADIO-ViPE demonstrates state-of-the-art performance on the TUM-RGBD benchmark, offering robust semantic grounding for robotics and unconstrained video streams. AI

IMPACT Enables open-vocabulary semantic grounding in dynamic environments using only monocular video, advancing robotics and video analysis.

RANK_REASON Academic paper introducing a new system for semantic SLAM.

Read on arXiv cs.CV →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.CV TIER_1 English(EN) · Zaid Nasser, Mikhail Iumanov, Tianhao Li, Maxim Popov, Jaafar Mahmoud, Sergey Kolyubin · 2026-04-30 04:00

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments

arXiv:2604.26067v1 Announce Type: new Abstract: We present RADIO-ViPE (Reduce All Domains Into One -- Video Pose Engine), an online semantic SLAM system that enables geometry-aware open-vocabulary grounding, associating arbitrary natural language queries with localized 3D regions…

COVERAGE [1]

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments

RELATED ENTITIES

RELATED TOPICS