OpenAI has introduced BrowseComp, a new benchmark designed to evaluate the ability of AI agents to find difficult-to-locate information online. This benchmark consists of 1,266 challenging problems that require agents to sift through numerous websites, unlike simpler benchmarks that focus on retrieving basic facts. The goal is to measure how well AI can handle complex information retrieval tasks, with questions designed to be hard to solve but easy to verify. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON OpenAI released a research paper and benchmark for evaluating AI agents.