视觉语言模型在基础路径跟随任务中失败

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-15 06:48

研究人员发现视觉语言模型（VLMs）在视觉路径跟随方面存在显著的故障模式。即使是先进的VLMs也难以持续地追踪指定路径，经常切换到附近视觉上相似的替代路径。这种被称为“局部竞争”的问题，尽管付出了扩大模型规模、引入推理能力或提供明确追踪指令的努力，仍然存在。该问题不仅限于受控环境，还影响到现实世界中的场景，例如缠绕的电缆和地铁地图。 AI

影响识别出视觉语言模型在需要精确视觉导航的任务中的关键故障，可能影响机器人和自主系统。

排序理由该集群包含一篇学术论文，详细介绍了现有模型局限性的一项新发现。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

arXiv

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Albert No · 2026-05-15 06:48

VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following

Vision-language models (VLMs) achieve strong performance on multimodal benchmarks, but may still lack robust control over basic visual operations. We study \textit{line tracing}, where a model must follow a selected visual path through successive local continuations. To isolate t…

报道来源 [1]

VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following

相关实体

相关话题