English(EN) USS: Unified Spatial-Semantic Prompts for Embodied Visual Tracking with Latent Dynamics Learning

新的USS框架统一了用于具身视觉跟踪的空间和语义提示

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-24 14:25

研究人员推出了一种新颖的具身视觉跟踪（EVT）框架USS，该框架超越了仅文本的目标指示，转向统一的空间-语义提示系统。该方法在一个单一架构中集成了文本、点、边界框和掩码等各种提示类型。USS利用潜在世界模型来预测未来的表示，增强了时间鲁棒性。实际机器人实验表明，显式的空间线索提高了跟踪成功率，尤其是在具有干扰物和长时间任务的复杂场景中，其性能优于仅文本的方法。 AI

影响这项研究可能带来更强大、更精确的具身人工智能系统，使其能够在现实环境中进行复杂的导航和物体跟踪。

排序理由这是一篇详细介绍计算机视觉任务新框架的研究论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CV TIER_1 English(EN) · Yuchen Xie, Xinyu Zhou, Kuangji Zuo, Yanshuo Lu, Fengrui Huang, Boyu Ma, Jianfei Yang · 2026-06-25 04:00

USS: Unified Spatial-Semantic Prompts for Embodied Visual Tracking with Latent Dynamics Learning

arXiv:2606.25880v1 Announce Type: new Abstract: Embodied Visual Tracking (EVT) requires an agent to continuously follow a specified target while actively moving through dynamic environments. However, prevailing EVT paradigms predominantly rely on language-based target indication.…
arXiv cs.CV TIER_1 English(EN) · Jianfei Yang · 2026-06-24 14:25

USS: Unified Spatial-Semantic Prompts for Embodied Visual Tracking with Latent Dynamics Learning

Embodied Visual Tracking (EVT) requires an agent to continuously follow a specified target while actively moving through dynamic environments. However, prevailing EVT paradigms predominantly rely on language-based target indication. While language is expressive and convenient, cl…

报道来源 [2]

USS: Unified Spatial-Semantic Prompts for Embodied Visual Tracking with Latent Dynamics Learning

USS: Unified Spatial-Semantic Prompts for Embodied Visual Tracking with Latent Dynamics Learning

相关实体

相关话题