English(EN) Forget Without Compromise: Nexus Sampling for Streaming KV-Cache Eviction Under Fixed Budgets

Nexus Sampling 改进 LLM KV 缓存驱逐，减少内存使用

作者 PulseAugur 编辑部 · [2 个来源] · 2026-06-22 21:42

研究人员开发了 Nexus Sampling，一种用于大型语言模型 KV 缓存驱逐的新颖方法，特别适用于长上下文和代理工作负载。这种无需训练的方法将 Nexus 评分与加权水库采样相结合，以保留可能因确定性 top-K 选择而丢失的重要 token。Nexus Sampling 在保留细微重要 token 方面理论上优于传统方法，并在 LongBench 等基准测试中实证达到了与密集注意力相当的性能，同时显著减少了缓存内存使用。 AI

影响该方法可以显著减小 LLM 的内存占用，从而实现更复杂和更长上下文的应用。

排序理由该集群包含一篇详细介绍 LLM 推理优化新方法的学术论文。

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.LG TIER_1 English(EN) · Duc Duong, Hoang Anh Duy Le, Jianwen Xie, Anshumali Shrivastava, Zhaozhuo Xu · 2026-06-24 04:00

无损遗忘：固定预算下的流式 KV 缓存逐出 Nexus 采样

arXiv:2606.23961v1 Announce Type: new Abstract: Long-context and agentic LLM workloads push the KV cache past any fixed memory budget, forcing the inference stack to permanently evict tokens at every step of a continuous-inference stream. Existing methods all share the same templ…
arXiv cs.LG TIER_1 English(EN) · Zhaozhuo Xu · 2026-06-22 21:42

无损遗忘：固定预算下的流式KV缓存逐出Nexus采样

Long-context and agentic LLM workloads push the KV cache past any fixed memory budget, forcing the inference stack to permanently evict tokens at every step of a continuous-inference stream. Existing methods all share the same template, a per-step direct-attention score followed …

报道来源 [2]

无损遗忘：固定预算下的流式 KV 缓存逐出 Nexus 采样

无损遗忘：固定预算下的流式KV缓存逐出Nexus采样

相关实体

相关话题