English(EN)The Sword, Shield, and Achilles' Heel: Characterizing the Linguistic Inductive Bias of Large Language Models for Spatial Reasoning in Navigation Planning
新的大型语言模型研究增强了超越符号模式的空间推理能力
作者PulseAugur 编辑部·[9 个来源]·
研究人员正在开发新的方法来提高大型语言模型(LLM)的空间推理能力,方法是超越符号模式匹配,实现真正的几何理解。一种方法引入了空间语言模型(SLM),它将位置视为一等模态,并使用专门的数据集和基准进行训练和评估。另一种方法,想象感知令牌(IPT),通过允许多模态模型推断未见的空间配置来增强它们,从而提高路径跟踪和多视图计数等任务的性能。此外,研究还在调查语言偏差的影响以及度量空间接地对LLM空间预测的重要性。
AI
arXiv:2606.04381v1 Announce Type: cross Abstract: Recent large language models (LLMs) often appear to exhibit spatial reasoning ability; however, this capability is largely \emph{symbolic}, arising from pattern matching over spatial language rather than true \emph{geometric} reas…
arXiv cs.AI
TIER_1English(EN)·Mahtab Bigverdi, Lindsey Li, Weikai Huang, Yiming Liu, Jaemin Cho, Jieyu Zhang, Tuhin Kundu, Chris Dangjoo Kim, Zelun Luo, Linda Shapiro, Ranjay Krishna·
arXiv:2606.03988v1 Announce Type: new Abstract: Vision language models (VLMs) excel at many tasks but still struggle with spatial reasoning when critical information is not directly observable. Many such problems require imaginative perception: inferring what would be seen from a…
Recent large language models (LLMs) often appear to exhibit spatial reasoning ability; however, this capability is largely \emph{symbolic}, arising from pattern matching over spatial language rather than true \emph{geometric} reasoning over space. Because LLMs operate on discrete…
Vision language models (VLMs) excel at many tasks but still struggle with spatial reasoning when critical information is not directly observable. Many such problems require imaginative perception: inferring what would be seen from an unseen viewpoint, tracing paths through occlud…
Latent reasoning has improved sequential recommendation by iteratively refining representations before prediction, but does it help spatial prediction? We find that the answer depends on whether reasoning is grounded in the underlying metric space. Without such grounding, latent …
arXiv:2606.01914v1 Announce Type: new Abstract: Multimodal large language models (MLLMs) remain unreliable on spatial multiple-choice questions, and their failures are often attributed to poorly attended visual information. In this work, we identify a complementary failure mode, …
Multimodal large language models (MLLMs) remain unreliable on spatial multiple-choice questions, and their failures are often attributed to poorly attended visual information. In this work, we identify a complementary failure mode, spatial lexical bias: adding a spatial relation …
arXiv:2605.31404v1 Announce Type: cross Abstract: Large Language Model (LLM)-based navigation systems commonly construct explicit spatial representations (e.g., topological graphs, semantic raster maps) and translate them into textual descriptions as LLMs' inputs. However, the li…
Large Language Model (LLM)-based navigation systems commonly construct explicit spatial representations (e.g., topological graphs, semantic raster maps) and translate them into textual descriptions as LLMs' inputs. However, the linguistic structures of such text-based spatial rep…