English(EN) KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

KV-Fold 可在无需重新训练的情况下实现长上下文 LLM 推理

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-12 17:53

研究人员开发了 KV-Fold，一种无需重新训练即可扩展大型语言模型上下文窗口的新颖方法。该技术将键值缓存视为函数式编程风格的折叠中的累加器，使模型能够处理数据序列块，同时保持稳定的内部状态。KV-Fold 在各种上下文长度和模型大小的“针尖麦芒”基准测试中展示了 100% 的精确匹配检索，并且运行在单个 GPU 的内存限制内。 AI

影响使 LLM 能够在不进行昂贵重新训练的情况下处理更长的上下文，从而可能提高需要广泛背景信息的任务的性能。

排序理由该集群包含一篇详细介绍 LLM 推理新方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Alvaro Velasquez · 2026-05-12 17:53

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

We introduce KV-Fold, a simple, training-free long-context inference protocol that treats the key-value (KV) cache as the accumulator in a left fold over sequence chunks. At each step, the model processes the next chunk conditioned on the accumulated cache, appends the newly prod…

报道来源 [1]

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

相关实体

相关话题