Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 5d · [2 sources]

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

Researchers have developed QuestBench, a new benchmark designed to teach students about AI by having them construct and evaluate AI systems. This method encourages students to define what constitutes a trustworthy answer, moving beyond simply using AI as a productivity tool. The benchmark, comprising 256 questions across 14 humanities and social science domains, revealed significant failures in current AI systems, with the best performer, GPT-5.5, achieving only a 57.58% pass rate. AI

IMPACT Highlights the limitations of current AI in complex knowledge domains, emphasizing the need for better evaluation methods.

GPT-5.5
QuestBench
AI