PulseAugur
实时 07:53:05
English(EN) ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering

ProMSA代理推动知识库视觉问答发展

研究人员开发了ProMSA,一种用于知识库视觉问答(KB-VQA)的新型代理。与使用固定检索管道的先前方法不同,ProMSA根据工具调用预算和去重情况,自适应地选择图像搜索、文本搜索或停止。该代理使用拒绝采样SFT和一种称为TN-GSPO的序列级RL目标进行训练。在E-VQA和InfoSeek数据集上的实验表明,与现有的RAG和代理基线相比,ProMSA在检索和端到端准确性方面有所提高。 AI

影响 推动了多模态任务的基于代理的推理,有可能改进复杂的جست information retrieval systems。

排序理由 发布了一篇详细介绍新型AI代理及其方法论的研究论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 3 个来源。 我们如何撰写摘要 →

ProMSA代理推动知识库视觉问答发展

报道来源 [3]

  1. arXiv cs.AI TIER_1 English(EN) · ZhengXian Wu, Hangrui Xu, Kai Shi, Zhuohong Chen, Yunyao Yu, Chuanrui Zhang, Zirui Liao, Jun Yang, Zhenyu Yang, Haonan Lu, Haoqian Wang ·

    ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering

    arXiv:2606.27974v1 Announce Type: cross Abstract: Knowledge-based Visual Question Answering (KB-VQA) requires models to combine image understanding with external knowledge. Most prior methods use a fixed retrieve-then-generate pipeline with a pre-selected retriever and a static t…

  2. arXiv cs.AI TIER_1 English(EN) · Haoqian Wang ·

    ProMSA:面向知识密集型视觉问答的渐进式多模态搜索代理

    Knowledge-based Visual Question Answering (KB-VQA) requires models to combine image understanding with external knowledge. Most prior methods use a fixed retrieve-then-generate pipeline with a pre-selected retriever and a static top-k setting, which is not adaptive during reasoni…

  3. Hugging Face Daily Papers TIER_1 English(EN) ·

    ProMSA:Progressive Multimodal Search Agents for Knowledge-Based Visual Question Answering

    A progressive multimodal search agent for knowledge-based visual question answering that adaptively selects search strategies and optimizes through sequence-level reinforcement learning.