PulseAugur
实时 06:16:01
English(EN) ReasonCLIP-58M: Visually Grounded Commonsense Reasoning Supervision for CLIP

ReasonCLIP-58M通过视觉常识推理增强CLIP模型

研究人员推出ReasonCLIP-58M,一个用于持续预训练CLIP风格模型的新框架。该方法整合了大规模推理监督,以增强视觉基础的常识推理和组合推理能力。该框架采用两阶段策略,在逐步添加推理信号的同时保持描述性对齐,并得到了新的数据集和诊断评估基准的支持。ReasonCLIP-58M可用作多模态大型语言模型的即插即用视觉编码器,在不增加推理成本的情况下提高性能。 AI

影响 增强了多模态模型中的视觉推理能力,可能在需要更深入图像理解的应用中提高性能。

排序理由 该集群包含一篇详细介绍预训练视觉模型新方法的论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

ReasonCLIP-58M通过视觉常识推理增强CLIP模型

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Sicheng Zhang, Muzammal Naseer, Binzhu Xie, Naufal Suryanto, Shi Qiu, Jamal Bentahar, Naveed Akhtar, Mubarak Shah ·

    ReasonCLIP-58M: Visually Grounded Commonsense Reasoning Supervision for CLIP

    arXiv:2606.26794v1 Announce Type: cross Abstract: CLIP and its variants are widely adopted visual backbones in multimodal systems, but their pretraining remains dominated by descriptive image-text alignment. As downstream applications increasingly demand visually grounded commons…

  2. arXiv cs.CV TIER_1 English(EN) · Mubarak Shah ·

    ReasonCLIP-58M: CLIP的视觉基础常识推理监督

    CLIP and its variants are widely adopted visual backbones in multimodal systems, but their pretraining remains dominated by descriptive image-text alignment. As downstream applications increasingly demand visually grounded commonsense inference and compositional reasoning, it rem…