English(EN) Adapting Vision-Language Models from Iconic to Inclusive for Multi-Label Recognition Without Labels

新方法使视觉语言模型适应多标签图像识别

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-11 04:00

研究人员开发了一个新的无监督框架，用于适应视觉语言模型（VLMs）以实现更全面的多标签图像识别。该方法解决了VLMs倾向于关注单个标志性物体而忽略图像中其他相关标签的问题。通过采用“裁剪”和“缝合”阶段，该框架增强了模型识别多个对象和调整标签分布的能力，而无需手动注释。实验表明，这种方法在性能上显著优于现有的无监督方法，甚至优于一些弱监督基线。 AI

影响无需手动标记即可实现更全面的图像理解，可能改进图像搜索和内容审核等应用。

排序理由该集群包含一篇学术论文，详细介绍了一种适应现有AI模型的新方法。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Cheng Chen, Jingyu Zhou, Yifan Zhao, Jia Li · 2026-06-11 04:00

Adapting Vision-Language Models from Iconic to Inclusive for Multi-Label Recognition Without Labels

arXiv:2606.11626v1 Announce Type: new Abstract: Understanding multi-label images remains a challenging task in computer vision. With the rapid progress of vision-language multimodal learning, vision-language models (VLMs) enable zero-shot recognition without labeled data. However…

报道来源 [1]

Adapting Vision-Language Models from Iconic to Inclusive for Multi-Label Recognition Without Labels

相关实体

相关话题