English(EN) Text-Guided Visual Representation Learning for Robust Multimodal E-Commerce Recommendation

新框架通过文本引导式视觉学习提升电子商务推荐效果

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-17 10:20

研究人员开发了一个名为 Text-Guided Q-Former (TGQ-Former) 的新框架，以改进电子商务中的多模态推荐系统。该方法利用结构化元数据来指导从产品图像中提取视觉信息，有助于过滤掉促销叠加和背景杂乱等噪声。实验表明，TGQ-Former 在大规模数据集上将 Hit Rate@100 的平均值提高了 6.04%，显著提高了检索准确性。 AI

影响通过更好地整合视觉和文本数据来提高商品检索的准确性，从而增强电子商务推荐系统。

排序理由发表了一篇详细介绍多模态表示学习新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.IR (Information Retrieval) 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Jungong Han · 2026-05-17 10:20

文本引导的视觉表示学习用于鲁棒的多模态电子商务推荐

Multimodal item embeddings are crucial for e-commerce item-to-item (I2I) retrieval, yet real-world product images often contain promotional overlays and background clutter that inject spurious visual cues and degrade retrieval robustness. This issue is particularly pronounced in …

报道来源 [1]

文本引导的视觉表示学习用于鲁棒的多模态电子商务推荐

相关实体

相关话题