English(EN) Predictive Prefetching for Retrieval-Augmented Generation

新的RAG框架预测信息需求以降低延迟

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-18 07:45

研究人员开发了一个新的检索增强生成（RAG）框架，通过预测和预取信息来显著降低延迟。该系统分析生成动态，提前几个token预测信息需求，从而实现比当前方法更高效的异步检索。实验表明，在保持生成答案质量的同时，端到端延迟和首个token生成时间得到了大幅缩减。 AI

影响降低RAG系统的延迟，可能加速AI驱动的信息检索和生成。

排序理由该集群包含一篇详细介绍新技术方法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Shichao Pei · 2026-05-18 07:45

Predictive Prefetching for Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) improves factual grounding in large language models but suffers from substantial latency due to synchronous retrieval. While recent work explores asynchronous retrieval, existing approaches rely on heuristic coordination between retrieval and …