Can Vision Foundation Models Navigate? Zero-Shot Real-World Evaluation and Lessons Learned
A new research paper evaluates five state-of-the-art visual navigation models (VNMs) in real-world scenarios, revealing significant limitations beyond simple success rates. The study, conducted by Maeva Guerrier and colleagues, found that models like GNM, ViNT, NoMaD, NaviBridger, and CrossFormer frequently collide with objects, indicating a lack of geometric understanding. Furthermore, these models struggle to differentiate between perceptually similar locations and their performance degrades under environmental changes such as motion blur or sunflare. The researchers plan to release their evaluation codebase and dataset to promote reproducible benchmarking. AI
IMPACT Reveals critical limitations in current visual navigation models, highlighting a need for improved geometric understanding and robustness for real-world robotic applications.