PulseAugur
实时 12:19:40
English(EN) Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

新基准测试基础模型的3D导航和视角调整能力

研究人员推出TVRBench,这是一个旨在测试基础模型在3D环境中主动调整视角以匹配目标图像能力的新基准。当前模型在此任务上表现不佳,尤其是在多轮视觉历史和将视觉差异转化为具身运动方面。训练后技术,特别是视觉-动作SFT,在提高性能方面显示出希望,其中一个模型成功率超过50%。 AI

影响 为评估和训练基础模型的具身空间智能建立了一个新基准,有望推动机器人技术和交互式AI的进步。

排序理由 这是一篇介绍基础模型新基准和评估方法的学术论文。

在 Hugging Face Daily Papers 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    何处着手:基础模型能否通过主动探索达到目标视角?

    Target Viewpoint Reproduction task challenges foundation models to actively adjust 3D viewpoints to match target images, revealing limitations in visual history processing and embodied movement mapping, with a unified post-training framework improving success rates through variou…

  2. arXiv cs.CV TIER_1 English(EN) · Liyang Li, Muzhi Zhu, Zhiyue Zhao, Hengyu Zhao, Ke Liu, Linhao Zhong, Hao Chen, Chunhua Shen ·

    Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

    arXiv:2606.01247v1 Announce Type: new Abstract: Humans can reproduce the viewpoint specified by a target image through active head and body motion, yet spatial intelligence in foundation models has largely been studied as passive understanding of pre-collected observations. We in…