KV-Fold enables long-context LLM inference without retraining

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-12 17:53

Researchers have developed KV-Fold, a novel method for extending the context window of large language models without requiring retraining. This technique treats the key-value cache as an accumulator in a functional programming-style fold, allowing the model to process sequential chunks of data while maintaining a stable internal state. KV-Fold has demonstrated 100% exact-match retrieval on needle-in-a-haystack benchmarks across various context lengths and model sizes, operating within the memory constraints of a single GPU. AI

影响 Enables LLMs to process significantly longer contexts without costly retraining, potentially improving performance on tasks requiring extensive background information.

排序理由 The cluster contains an academic paper detailing a new method for LLM inference. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · Alvaro Velasquez · 2026-05-12 17:53

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

We introduce KV-Fold, a simple, training-free long-context inference protocol that treats the key-value (KV) cache as the accumulator in a left fold over sequence chunks. At each step, the model processes the next chunk conditioned on the accumulated cache, appends the newly prod…

报道来源 [1]

KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

相关实体

相关话题