QO-Bench: Diagnosing Query-Operator-Preserving Retrieval over Typed Event Tuples
Researchers have introduced QO-Bench, a new benchmark designed to evaluate how well retrieval-augmented generation (RAG) systems can preserve query operators when answering questions over structured event data. The benchmark consists of 22,984 news articles and 614 corporate events, with 785 questions that require precise query execution rather than just semantic relevance. Current RAG systems struggle to maintain the necessary typed values for operators like joins and intersections, often discarding crucial information during retrieval. Even with perfect evidence, operator execution remains a significant bottleneck, indicating a need for improved answer models beyond just better retrieval. AI
IMPACT Highlights a critical bottleneck in RAG systems for structured data, pushing research towards operator-preserving retrieval and better answer models.