PulseAugur
实时 11:48:49
English(EN) No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

新的检索方法用稀疏编码取代 K-means,实现更快、更准确的结果

研究人员推出了一种名为单阶段稀疏检索(SSR)的新方法,用于高效的多向量检索,该方法绕过了传统的 K-means 聚类。SSR 利用稀疏自编码器创建高维、稀疏的 token 嵌入表示,从而可以使用倒排索引代替压缩。这种方法显著减少了索引时间和检索延迟,同时提高了准确性,在 BEIR 基准测试中优于现有基线。 AI

影响 该方法在多向量检索系统的索引速度和检索延迟方面提供了显著改进,有可能加速依赖于大规模语义搜索的应用程序。

排序理由 这是一篇详细介绍信息检索新技术方法的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 4 个来源。 我们如何撰写摘要 →

报道来源 [4]

  1. arXiv cs.AI TIER_1 English(EN) · Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng, Stefanie Jegelka, Chenyu You ·

    No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

    arXiv:2605.30120v1 Announce Type: cross Abstract: Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retr…

  2. arXiv cs.AI TIER_1 English(EN) · Chenyu You ·

    No More K-means:Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

    Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval efficiency bottlenecks: to manage the immens…

  3. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Chenyu You ·

    告别K-means:单阶段稀疏编码实现高效多向量检索

    Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval efficiency bottlenecks: to manage the immens…

  4. Hugging Face Daily Papers TIER_1 English(EN) ·

    No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

    Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval efficiency bottlenecks: to manage the immens…