开源视觉语言模型在杂货商品检索准确性方面的评估

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-18 08:20

一篇新论文评估了190个开源视觉语言模型（VLMs）在杂货商品检索任务上的表现，这是无收银零售的关键组成部分。研究发现，数据质量比模型规模对提高准确性更重要。该研究还强调，如果使用更清洁的数据进行训练，较小、高效的模型可以优于较大的模型，并引入了一个名为“语义功率密度”的新指标来衡量模型效率。尽管在召回相关商品方面表现强劲，但目前最先进的模型在精确排序视觉上相似的产品方面仍存在困难。 AI

影响确定了使用开源VLMs提高杂货商品检索准确性的关键因素，可能对零售自动化产生影响。

排序理由学术论文评估开源模型在特定任务上的表现。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Rowel O. Atienza · 2026-05-18 08:20

What Matters for Grocery Product Retrieval with Open Source Vision Language Models

Multimodal product retrieval (MPR) underpins checkout-free retail and automated inventory systems, yet it demands fine-grained SKU discrimination that standard vision-language benchmarks fail to capture. We present the first systematic zero-shot evaluation of 190 open-source VLMs…

报道来源 [1]

What Matters for Grocery Product Retrieval with Open Source Vision Language Models

相关实体

相关话题