Brief · PulseAugur

RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 3w · [2 sources]

MVR-cache: Optimizing Semantic Caching via Multi-Vector Retrieval and Learned Prompt Segmentation

Researchers have developed MVR-cache, a new semantic caching system designed to reduce the costs and latency associated with Large Language Models (LLMs). This system utilizes Multi-Vector Retrieval (MVR) and a learnable prompt segmentation model to achieve more accurate identification of matching prompts. By intelligently splitting prompts and employing a reinforcement learning strategy, MVR-cache has demonstrated an increase in cache hit rates by up to 37% compared to existing state-of-the-art methods, while maintaining strict correctness guarantees. AI

IMPACT MVR-cache's significant improvement in cache hit rates could lead to reduced operational costs and faster response times for LLM-powered applications.

Large Language Models
MVR-cache
Multi-Vector Retrieval