English(EN) PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

新方法和基准提升多模态大语言模型视觉基础能力

作者 PulseAugur 编辑部 · [3 个来源] · 2026-05-22 04:00

研究人员开发了新的方法来改进多模态大语言模型（MLLMs）的视觉基础能力。其中一种方法 PGT 使用带有几何图元的程序化生成任务，提供更密集的监督，在各种基准测试中取得了显著的提升。另一项开发 AgroVG 引入了一个专门用于农业视觉基础的大规模基准，突显了当前模型在复杂场景下的局限性。 AI

影响视觉基础能力的进步对于实现农业和通用感知任务等领域更复杂的人工智能应用至关重要。

排序理由两篇研究论文介绍了用于多模态大语言模型视觉基础的新方法和基准。

AI 生成摘要 · Google Gemini · 来自 3 个来源。我们如何撰写摘要 →

报道来源 [3]

arXiv cs.AI TIER_1 English(EN) · Rim Assouel, Amir Bar, Michal Drozdzal, Adriana Romero-Soriano · 2026-05-25 04:00

PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

arXiv:2605.23883v1 Announce Type: cross Abstract: Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle with fine-grained understanding tasks. In this work, we propose Procedurally Generated Tasks (PGT), a simple data-driven framewor…
arXiv cs.CV TIER_1 English(EN) · Adriana Romero-Soriano · 2026-05-22 17:45

PGT: Procedurally Generated Tasks for improving visual grounding in MLLMs

Despite remarkable progress in Multimodal Large Language Models (MLLMs), these models still struggle with fine-grained understanding tasks. In this work, we propose Procedurally Generated Tasks (PGT), a simple data-driven framework that serves a dual purpose: inducing fine-graine…
arXiv cs.CV TIER_1 English(EN) · Haocheng Li, Juepeng Zheng, Zenghao Yang, Kaiqi Du, Guilong Xiao, Gengmeng Pu, Haohuan Fu, Jianxi Huang · 2026-05-22 04:00

AgroVG: A Large-Scale Multi-Source Benchmark for Agricultural Visual Grounding

arXiv:2605.22034v1 Announce Type: new Abstract: Visual grounding, the task of localizing objects described by natural-language expressions, is a foundational capability for agricultural AI systems, enabling applications such as selective weeding, disease monitoring, and targeted …