English(EN) The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System

RAG系统面临“覆盖幻觉”，浪费LLM成本

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-26 16:08

一篇新的研究论文介绍了“覆盖幻觉”现象，该现象在检索增强生成（RAG）系统中观察到，其中查询增强方法被普遍应用，导致不必要的LLM推理成本和延迟。对丹麦国家百科全书的案例研究显示，虽然合成查询表明超过90%需要增强，但只有27.8%的真实用户查询实际上需要。该论文提出了一种检索后级联方法，仅在必要时才升级到LLM增强，从而提高质量，将延迟降低31.8%，并为大多数查询提供服务而无需LLM增强。 AI

影响识别出RAG系统中存在的重大低效率，可能为生产部署节省大量LLM成本并降低延迟。

排序理由该集群包含一篇研究论文，详细介绍了RAG系统的新现象和提出的解决方案。

在 arXiv cs.IR (Information Retrieval) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Zafar Hussain, Kristoffer Nielbo · 2026-05-27 04:00

The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System

arXiv:2605.27220v1 Announce Type: new Abstract: In modern RAG pipelines, query augmentation methods such as HyDE and query expansion are applied to every query, resulting in substantial LLM inference costs and increased end-to-end latency. The empirical justification for this ove…
arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Kristoffer Nielbo · 2026-05-26 16:08

The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System

In modern RAG pipelines, query augmentation methods such as HyDE and query expansion are applied to every query, resulting in substantial LLM inference costs and increased end-to-end latency. The empirical justification for this overhead in real production traffic remains largely…

报道来源 [2]

The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System

The Coverage Illusion: From Pre-retrieval Routing Failure to Post-retrieval Cascades in a Production RAG System

相关实体

相关话题