PulseAugur
EN
LIVE 12:46:05

New benchmark tests foundation models' 3D navigation and viewpoint adjustment

Researchers have introduced TVRBench, a new benchmark designed to test foundation models' ability to actively adjust their viewpoint in 3D environments to match target images. Current models struggle significantly with this task, particularly with multi-turn visual history and translating visual discrepancies into embodied movement. Post-training techniques, especially visual-action SFT, have shown promise in improving performance, with one model reaching over 50% success. AI

IMPACT Establishes a new benchmark for evaluating and training embodied spatial intelligence in foundation models, potentially driving progress in robotics and interactive AI.

RANK_REASON This is a research paper introducing a new benchmark and evaluation methodology for foundation models.

Read on Hugging Face Daily Papers →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. Hugging Face Daily Papers TIER_1 English(EN) ·

    Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

    Target Viewpoint Reproduction task challenges foundation models to actively adjust 3D viewpoints to match target images, revealing limitations in visual history processing and embodied movement mapping, with a unified post-training framework improving success rates through variou…

  2. arXiv cs.CV TIER_1 English(EN) · Liyang Li, Muzhi Zhu, Zhiyue Zhao, Hengyu Zhao, Ke Liu, Linhao Zhong, Hao Chen, Chunhua Shen ·

    Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

    arXiv:2606.01247v1 Announce Type: new Abstract: Humans can reproduce the viewpoint specified by a target image through active head and body motion, yet spatial intelligence in foundation models has largely been studied as passive understanding of pre-collected observations. We in…