Researchers have developed a new hierarchical planner called Three-Step Nav to improve zero-shot vision-and-language navigation (VLN) agents. This method uses a three-view protocol to address common issues like drifting and premature halting in current MLLM-powered VLN systems. By looking forward for landmarks, looking now for sub-goal alignment, and looking backward to audit the trajectory, Three-Step Nav enhances navigation accuracy without requiring additional training. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT Improves zero-shot navigation accuracy for agents using multimodal large language models.
RANK_REASON This is a research paper detailing a new method for vision-and-language navigation.