PulseAugur
EN
LIVE 11:44:28

New DICE method enhances long-document retrieval by preserving chunk evidence

Researchers have developed a new method called DICE (Document Inference via Chunk Evidence) to improve long-document retrieval in dense retrieval systems. This technique addresses the issue where crucial information within long documents can be diluted during encoding, leading to retrieval failures. DICE works by splitting documents into chunks, encoding them independently, and then aggregating these representations into a single vector while maintaining the standard one-query-one-document interface. The method has shown significant improvements, particularly for documents exceeding 4k tokens, by reducing the Evidence Dilution Index (EDI) compared to traditional single-vector baselines. AI

IMPACT This method could significantly improve the performance of search and retrieval systems dealing with extensive textual data.

RANK_REASON The cluster contains an academic paper detailing a new method for improving long-document retrieval.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Shanshan Lyu, Yiwei Wang, Yujun Cai, Jiafeng Guo, Shenghua Liu ·

    Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation

    arXiv:2606.18781v1 Announce Type: new Abstract: Dense retrieval ranks one query vector against one document vector. On long documents, this interface can fail when a short but decisive span is weakened during document encoding before ranking. We study this failure mode as documen…

  2. arXiv cs.CL TIER_1 English(EN) · Shenghua Liu ·

    Lost in a Single Vector: Improving Long-Document Retrieval with Chunk Evidence Aggregation

    Dense retrieval ranks one query vector against one document vector. On long documents, this interface can fail when a short but decisive span is weakened during document encoding before ranking. We study this failure mode as document-side early compression and introduce the Evide…