PulseAugur
EN
LIVE 21:46:36

BM25 code retrieval improved with adaptive q-log odds

Researchers have developed a new method called adaptive q-log odds to improve the performance of BM25, a popular search algorithm, specifically for code retrieval tasks. This technique modifies the underlying mathematical formula of BM25 to better distinguish between similar code functions by adjusting how it weighs unique identifiers. When tested on a dataset of Go code, the new method significantly boosted retrieval accuracy, increasing the normalized discounted cumulative gain (NDCG@10) by nearly 90%. The researchers also found that the effectiveness of this fix is dependent on the tokenization process and has minimal impact on general text retrieval. AI

IMPACT Enhances code search capabilities, potentially improving developer productivity and the accuracy of retrieval-augmented coding systems.

RANK_REASON The cluster contains an academic paper detailing a new method for improving a specific algorithm. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

BM25 code retrieval improved with adaptive q-log odds

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Oktay Goktas ·

    Improving BM25 Code Retrieval Under Fixed Generic Tokenization: Adaptive q-Log Odds as a Drop-In BM25 Fix

    In retrieval-augmented coding, failures often begin when the relevant file is absent from the retrieved context. Under frozen generic tokenization, where a BM25 index has been built by a search system whose analyzer the practitioner does not control, this failure is routine: BM25…