PulseAugur
EN
LIVE 15:12:36

New method extracts BM25-ready sparse features from dense retrieval models

Researchers have introduced a new method called Latent Terms, which demonstrates that dense retrieval models can be decomposed into sparse features suitable for traditional BM25 scoring. This technique, applied to frozen retrievers using Sparse Autoencoders, extracts a latent vocabulary with Zipfian statistics without needing retrieval-specific adjustments or supervision. Latent Terms matches or surpasses existing single-vector scoring methods and SPLADE variants, and significantly outperforms its base model on the LIMIT benchmark. AI

IMPACT This research suggests that dense retrieval models possess underlying structures that can be leveraged for improved sparse retrieval, potentially enhancing search efficiency and effectiveness.

RANK_REASON The cluster contains an academic paper detailing a new method for information retrieval.

Read on arXiv cs.IR (Information Retrieval) →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New method extracts BM25-ready sparse features from dense retrieval models

COVERAGE [2]

  1. arXiv cs.AI TIER_1 English(EN) · Benjamin Clavi\'e, Sean Lee, Aamir Shakir, Makoto P. Kato ·

    Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

    arXiv:2605.29384v1 Announce Type: cross Abstract: We propose Latent Terms, a method revealing that models trained for dense retrieval, whether single- or multi-vector, learn representations that can trivially be decomposed into retrieval-ready sparse features. When trained on fro…

  2. arXiv cs.IR (Information Retrieval) TIER_1 English(EN) · Makoto P. Kato ·

    Latent Terms: Dense Retrievers Contain Trivially Extractable BM25-ready Zipfian Vocabularies

    We propose Latent Terms, a method revealing that models trained for dense retrieval, whether single- or multi-vector, learn representations that can trivially be decomposed into retrieval-ready sparse features. When trained on frozen retrievers, Sparse Autoencoders without any re…