This paper investigates the impact of different pre-training datasets and methods on the performance of Expanded-SPLADE (ESPLADE) models for neural information retrieval. The study found that models pre-trained on general corpora with higher learning rates, even with lower Masked Language Modeling accuracies, achieved better retrieval effectiveness. Furthermore, the research indicated that repeating the general pre-training dataset did not significantly improve retrieval effectiveness, and highlighted a trade-off between retrieval cost and effectiveness in highly pruned settings. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Provides insights into optimizing pre-training strategies for neural information retrieval models, potentially improving search engine performance.
RANK_REASON This is a research paper published on arXiv detailing experimental findings on pre-training methods for information retrieval models. [lever_c_demoted from research: ic=1 ai=1.0]