English(EN) GraSP-VL: Length as a Semantic Granularity Interface for Vision-Language Representations

GraSP-VL 方法在视觉-语言嵌入中解锁语义粒度

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-18 01:10

研究人员开发了 GraSP-VL 方法，通过将冻结的视觉-语言模型 (VLM) 嵌入的长度视为语义接口来更好地利用它们。该方法学习一个共享的前缀变换，允许较短的前缀表示粗粒度的语义角色，而较长的前缀则揭示更精细的区别。在 COCO/Flickr30K 数据集上的实验表明，GraSP-VL 能有效地将 VLM 嵌入重组为一个可截断的语义前缀接口，其性能优于简单的压缩技术。 AI

影响通过将嵌入长度视为语义接口，能够对视觉-语言模型的输出进行更细致的控制。

排序理由该集群包含一篇学术论文，详细介绍了一种处理视觉-语言模型嵌入的新方法。 [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.CV 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CV TIER_1 English(EN) · Honggang Qi · 2026-05-18 01:10

GraSP-VL：长度作为视觉语言表示的语义粒度接口

Frozen vision-language embeddings contain signals at multiple semantic resolutions, from object identity to attributes, relations, and full-caption meaning, but they expose these signals through a fixed-length vector interface. We study whether embedding length can be turned into…

报道来源 [1]

GraSP-VL：长度作为视觉语言表示的语义粒度接口

相关实体

相关话题