English(EN) Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

新方法从密集检索模型中提取适用于BM25的稀疏特征

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-28 05:36

研究人员引入了一种名为Latent Terms的新方法，该方法表明密集检索模型可以分解为适用于传统BM25评分的稀疏特征。该技术应用于使用稀疏自编码器的冻结检索器，在无需检索特定调整或监督的情况下，提取了具有齐夫分布统计特征的潜在词汇。Latent Terms在LIMIT基准测试上，其性能与现有的单向量评分方法和SPLADE变体相当或更优，并显著优于其基础模型。 AI

影响这项研究表明，密集检索模型具有可以利用以提高稀疏检索效率和效果的潜在结构。

排序理由该集群包含一篇详细介绍信息检索新方法的学术论文。

在 arXiv cs.IR (Information Retrieval) 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Benjamin Clavi\'e, Sean Lee, Aamir Shakir, Makoto P. Kato · 2026-05-29 04:00

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

arXiv:2605.29384v1 Announce Type: cross Abstract: We propose Latent Terms, a method revealing that models trained for dense retrieval, whether single- or multi-vector, learn representations that can trivially be decomposed into retrieval-ready sparse features. When trained on fro…
arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Makoto P. Kato · 2026-05-28 05:36

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

We propose Latent Terms, a method revealing that models trained for dense retrieval, whether single- or multi-vector, learn representations that can trivially be decomposed into retrieval-ready sparse features. When trained on frozen retrievers, Sparse Autoencoders without any re…

报道来源 [2]

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

相关实体

相关话题