PulseAugur
实时 13:13:31

新的DICE方法通过保留块证据来增强长文档检索

研究人员开发了一种名为DICE(Document Inference via Chunk Evidence)的新方法,以改进密集检索系统中的长文档检索。该技术解决了长文档中的关键信息在编码过程中可能被稀释导致检索失败的问题。DICE通过将文档分割成块,独立编码这些块,然后将这些表示聚合到单个向量中,同时保持标准的“一个查询-一个文档”接口。与传统的单向量基线相比,该方法通过降低证据稀释指数(EDI),在处理超过4k个token的文档方面显示出显著的改进。 AI

影响 该方法可以显著提高处理大量文本数据的搜索和检索系统的性能。

排序理由 该集群包含一篇详细介绍改进长文档检索新方法的学术论文。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Shanshan Lyu, Yiwei Wang, Yujun Cai, Jiafeng Guo, Shenghua Liu ·

    Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation

    arXiv:2606.18781v1 Announce Type: new Abstract: Dense retrieval ranks one query vector against one document vector. On long documents, this interface can fail when a short but decisive span is weakened during document encoding before ranking. We study this failure mode as documen…

  2. arXiv cs.CL TIER_1 English(EN) · Shenghua Liu ·

    Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation

    Dense retrieval ranks one query vector against one document vector. On long documents, this interface can fail when a short but decisive span is weakened during document encoding before ranking. We study this failure mode as document-side early compression and introduce the Evide…