English(EN) MVR-cache: Optimizing Semantic Caching via Multi-Vector Retrieval and Learned Prompt Segmentation

MVR-cache 将 LLM 语义缓存命中率提升 37%

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-24 07:33

研究人员开发了 MVR-cache，一个旨在降低大型语言模型 (LLM) 相关成本和延迟的新型语义缓存系统。该系统利用多向量检索 (MVR) 和一个可学习的提示分割模型来更准确地识别匹配的提示。通过智能地分割提示并采用强化学习策略，MVR-cache 与现有的最先进方法相比，缓存命中率提高了高达 37%，同时保持了严格的正确性保证。 AI

影响 MVR-cache 缓存命中率的显著提高可能导致 LLM 驱动的应用程序的运营成本降低和响应时间加快。

排序理由该集群包含一篇详细介绍 LLM 语义缓存优化新方法的学术论文。

在 arXiv cs.IR (Information Retrieval) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Ali Noshad, Zishan Zheng, Yinjun Wu · 2026-05-26 04:00

MVR-cache: Optimizing Semantic Caching via Multi-Vector Retrieval and Learned Prompt Segmentation

arXiv:2605.24914v1 Announce Type: cross Abstract: To reduce LLM costs and latency, semantic caching systems must accurately identify when a new prompt matches a cached one. Current methods often rely on simplistic similarity measures, which limit their effectiveness. We introduce…
arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Yinjun Wu · 2026-05-24 07:33

MVR-cache: Optimizing Semantic Caching via Multi-Vector Retrieval and Learned Prompt Segmentation

To reduce LLM costs and latency, semantic caching systems must accurately identify when a new prompt matches a cached one. Current methods often rely on simplistic similarity measures, which limit their effectiveness. We introduce MVR-cache, a novel semantic caching approach that…

报道来源 [2]

MVR-cache: Optimizing Semantic Caching via Multi-Vector Retrieval and Learned Prompt Segmentation

MVR-cache: Optimizing Semantic Caching via Multi-Vector Retrieval and Learned Prompt Segmentation

相关实体

相关话题