Brief · PulseAugur

TOOL · arXiv cs.CV English(EN) · 1w

GraSP-VL: Length as a Semantic Granularity Interface for Vision-Language Representations

Researchers have developed GraSP-VL, a method to better utilize frozen vision-language model (VLM) embeddings by treating their length as a semantic interface. This approach learns a shared prefix transform that allows shorter prefixes to represent coarse semantic roles and longer prefixes to reveal finer distinctions. Experiments on COCO/Flickr30K datasets show GraSP-VL effectively reorganizes VLM embeddings into a truncatable semantic prefix interface, outperforming simple compression techniques. AI

IMPACT Enables more nuanced control over vision-language model outputs by treating embedding length as a semantic interface.

Vision-Language Models
GraSP-VL
COCO/Flickr30K