PulseAugur
实时 03:44:12

KV-Fold enables long-context LLM inference without retraining

Researchers have developed KV-Fold, a novel method for extending the context window of large language models without requiring retraining. This technique treats the key-value cache as an accumulator in a functional programming-style fold, allowing the model to process sequential chunks of data while maintaining a stable internal state. KV-Fold has demonstrated 100% exact-match retrieval on needle-in-a-haystack benchmarks across various context lengths and model sizes, operating within the memory constraints of a single GPU. AI

影响 Enables LLMs to process significantly longer contexts without costly retraining, potentially improving performance on tasks requiring extensive background information.

排序理由 The cluster contains an academic paper detailing a new method for LLM inference. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

KV-Fold enables long-context LLM inference without retraining

报道来源 [1]

  1. arXiv cs.AI TIER_1 English(EN) · Alvaro Velasquez ·

    KV-Fold: One-Step KV-Cache Recurrence for Long-Context Inference

    We introduce KV-Fold, a simple, training-free long-context inference protocol that treats the key-value (KV) cache as the accumulator in a left fold over sequence chunks. At each step, the model processes the next chunk conditioned on the accumulated cache, appends the newly prod…