Researchers have developed GeoThinker, a novel framework that enhances spatial reasoning in multimodal large language models (MLLMs) by actively integrating geometric information. Unlike previous passive fusion methods, GeoThinker allows models to selectively retrieve and incorporate relevant geometric data based on their internal reasoning needs. This active integration, achieved through Spatial-Grounded Fusion and Importance Gating, has led to state-of-the-art performance on spatial intelligence benchmarks, including a peak score of 72.6 on VSI-Bench. AI
影响 Introduces a new method for active geometric integration in MLLMs, potentially improving performance in complex spatial tasks.
排序理由 Academic paper introducing a new framework for spatial reasoning in MLLMs.
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →