Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work
Researchers have developed QuestBench, a new benchmark designed to teach students about AI by having them construct and evaluate AI systems. This method encourages students to define what constitutes a trustworthy answer, moving beyond simply using AI as a productivity tool. The benchmark, comprising 256 questions across 14 humanities and social science domains, revealed significant failures in current AI systems, with the best performer, GPT-5.5, achieving only a 57.58% pass rate. AI
IMPACT Highlights the limitations of current AI in complex knowledge domains, emphasizing the need for better evaluation methods.