English(EN) Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

新的SAGA框架使用多模态大语言模型来增强用于图像检索的视觉嵌入

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-16 04:00

研究人员开发了一个名为SAGA的新框架，该框架利用多模态大语言模型（MLLMs）来改进用于图像检索的视觉嵌入。与使用统一标量距离的传统方法不同，SAGA利用来自冻结的多模态大语言模型的特定属性梯度来提供更细致的监督。这种方法增强了编码器捕捉图像之间区分性属性的能力，从而在多个基准数据集上显著提高了零样本图像检索性能。 AI

影响通过为视觉嵌入提供属性感知监督来增强图像检索，性能优于最先进的基线。

排序理由该集群包含一篇详细介绍新研究框架和方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Shubhang Bhatnagar, Dheeraj Baiju, Narendra Ahuja · 2026-06-16 04:00

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

arXiv:2606.15134v1 Announce Type: cross Abstract: Vision encoders for retrieval are typically trained with class-label supervision: each training pair reduces to a scalar that uniformly pushes the embedding apart or pulls it together, as if every visual attribute either differed …

报道来源 [1]

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

相关实体

相关话题