Two Pre-Registered Benchmarks for Audit-Native RAG: RAB (EU AI Act 10/12/19) + LRB (Time-Travel Retrieval)
A new set of benchmarks, RAB and LRB, has been developed to evaluate Retrieval-Augmented Generation (RAG) systems, focusing on auditability and temporal data accuracy. RAB, or Replayable-Audit Benchmark, assesses a system's ability to replay decisions, aligning with EU AI Act articles on record-keeping. LRB, the Lifecycle Retrieval Benchmark, tests a system's capacity to retrieve data valid at a specific point in time, rather than just the most current information. The benchmarks are designed to be deterministic and runnable locally, with accompanying code and preprints available. AI
IMPACT These benchmarks provide a standardized way to test RAG systems for auditability and temporal data accuracy, crucial for regulatory compliance and reliable AI applications.