PulseAugur
实时 13:22:35

BM25 code retrieval improved with adaptive q-log odds

Researchers have developed a new method called adaptive q-log odds to improve the performance of BM25, a popular search algorithm, specifically for code retrieval tasks. This technique modifies the underlying mathematical formula of BM25 to better distinguish between similar code functions by adjusting how it weighs unique identifiers. When tested on a dataset of Go code, the new method significantly boosted retrieval accuracy, increasing the normalized discounted cumulative gain (NDCG@10) by nearly 90%. The researchers also found that the effectiveness of this fix is dependent on the tokenization process and has minimal impact on general text retrieval. AI

影响 Enhances code search capabilities, potentially improving developer productivity and the accuracy of retrieval-augmented coding systems.

排序理由 The cluster contains an academic paper detailing a new method for improving a specific algorithm. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →

BM25 code retrieval improved with adaptive q-log odds

报道来源 [1]

  1. arXiv cs.AI TIER_1 English(EN) · Oktay Goktas ·

    Improving BM25 Code Retrieval Under Fixed Generic Tokenization: Adaptive q-Log Odds as a Drop-In BM25 Fix

    In retrieval-augmented coding, failures often begin when the relevant file is absent from the retrieved context. Under frozen generic tokenization, where a BM25 index has been built by a search system whose analyzer the practitioner does not control, this failure is routine: BM25…