BM25
PulseAugur coverage of BM25 — every cluster mentioning BM25 across labs, papers, and developer communities, ranked by signal.
24 day(s) with sentiment data
-
Stale documents in RAG systems pose significant risks, study finds
A recent study conducted by Emory University and IBM Research investigated the impact of stale documents on retrieval-augmented generation (RAG) systems. The experiment revealed that outdated information in a RAG system…
-
Many-shot ICL boosts low-resource language translation, study finds
Researchers have conducted an empirical study on many-shot in-context learning (ICL) for machine translation, specifically focusing on low-resource languages. Their findings indicate that increasing the number of exampl…
-
Adaptive Re-Ranking cuts IR latency by routing queries efficiently
Researchers have introduced Adaptive Re-Ranking, a framework designed to optimize computational costs and latency in information retrieval systems. This method routes queries based on their complexity, employing differe…
-
Build Hybrid RAG System Combining Semantic and Keyword Search
This article details the construction of a hybrid Retrieval-Augmented Generation (RAG) system that combines the strengths of both semantic and keyword search. It addresses the limitations of single-mode retrieval, where…
-
AI agents fail due to flawed search index distribution, not prompting
A common issue in AI agents is that their search results appear correct but lead to factually wrong answers due to problems with the underlying search index. This is not a prompting issue but a distribution problem, whe…
-
HAKARI-Bench offers lightweight evaluation for retrieval models · 2 sources tracked
Researchers have introduced HAKARI-Bench, a lightweight benchmark designed to streamline the evaluation of retrieval architectures and efficiency settings for retrieval-augmented generation and semantic search. This new…
-
VISTA Architect AI system integrates LLMs with EHRs for medical data synthesis
Researchers have developed VISTA Architect, a novel AI system designed to integrate large language models with electronic health records (EHRs). This system transforms clinical data into a knowledge graph, creating a sy…
-
Novelty-Aware Agentic Retrieval System Enhances Scientific Literature Search
Researchers have developed a Novelty-Aware Research Agent, an agentic retrieval system designed to go beyond standard RAG by providing structured multi-step reasoning for scientific literature search. This system aims t…
-
Local 7B model study dissects agentic RAG for multi-hop QA
Researchers have conducted an ablation study on agentic retrieval-augmented generation (RAG) systems, specifically focusing on multi-hop question answering with a local 7B parameter model, Qwen2.5-7B-Instruct. The study…
-
PostgreSQL AI deployment challenges addressed by open-source stack
Mike Josephson from pgEdge discussed the challenges of deploying AI applications with PostgreSQL, highlighting that most current applications are still in experimental stages. He detailed an open-source stack, including…
-
Streaming RAG technique hides tool latency by stabilizing query intent early
A new arXiv paper investigates Streaming Retrieval-Augmented Generation (Streaming RAG), a technique that hides tool latency by issuing retrieval queries in parallel with user input. Researchers characterized "tool-inte…
-
Streaming RAG research quantifies latency reduction via tool-intent stabilization
A new research paper explores the effectiveness of Streaming Retrieval-Augmented Generation (Streaming RAG) in reducing latency for users. The study introduces the concept of 'tool-intent stabilization,' which measures …
-
AI agent learns to improve legal case retrieval through self-evolution
Researchers have developed a novel self-evolving agent framework designed to enhance legal case retrieval systems. This agent iteratively refines rewriting rules for the BM25 baseline by utilizing an LLM within an autom…
-
RAG pipelines: From BM25 to reranking for improved AI assistant accuracy
A developer detailed the process of building a retrieval-augmented generation (RAG) pipeline for an AI assistant integrated into a Go-based task queue system. The initial implementation used ChromaDB for vector search, …
-
daVinci-kernel uses RL to optimize GPU kernels with evolving skill library
Researchers have developed daVinci-kernel, a novel reinforcement learning framework designed to optimize GPU kernels. This system co-evolves skill selection, summarization, and utilization, employing three agents that s…
-
New framework redefines entity relevance for document retrieval
A new research paper proposes a framework to improve document re-ranking by distinguishing between conceptual entity relevance and observable entity relevance. The authors argue that current entity-aware retrieval metho…
-
RAG System Quality Hinges on Retrieval, Not Just Prompts
This article argues that most problems with Retrieval-Augmented Generation (RAG) systems stem from poor retrieval rather than the language model itself. The author suggests eight fixes, prioritizing improvements to the …
-
BM25 and Dense Fusion: Hybrid RAG for Exact Match Accuracy
A technical article discusses the limitations of pure vector search in Retrieval-Augmented Generation (RAG) systems, particularly when dealing with exact identifiers like error codes, product SKUs, or specific phrases. …
-
Chinese Parsers DeepDoc, MinerU Crossover in Japanese RAG Performance
A comparative analysis of two Chinese open-source document parsers, DeepDoc and MinerU, for Japanese RAG systems reveals a crossover performance based on the retrieval method used. DeepDoc demonstrated superior results …
-
Structured Parsing Boosts Dense Retrieval Performance in LLM RAG
A study comparing document parsing strategies for retrieval-augmented generation (RAG) found that structured parsing significantly benefits dense retrieval more than traditional BM25 methods. When using dense retrieval,…