New QuestBench benchmark reveals AI failures in humanities

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-20 17:09

Researchers have developed QuestBench, a new benchmark designed to teach students about AI by having them construct and evaluate AI systems. This method encourages students to define what constitutes a trustworthy answer, moving beyond simply using AI as a productivity tool. The benchmark, comprising 256 questions across 14 humanities and social science domains, revealed significant failures in current AI systems, with the best performer, GPT-5.5, achieving only a 57.58% pass rate. AI

影响 Highlights the limitations of current AI in complex knowledge domains, emphasizing the need for better evaluation methods.

排序理由 The cluster describes a new academic paper introducing a novel benchmark for evaluating AI systems.

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Haiyang Shen, Jiuzheng Wang, Taian Guo, Mugeng Liu, Wenchun Jing, Chongyang Pan, Siqi Zhong, Zhiyang Chen, Weichen Bi, Yudong Han, Xiaoying Bai, Yun Ma · 2026-05-22 04:00

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

arXiv:2605.21413v2 Announce Type: new Abstract: As AI becomes part of everyday learning, many courses teach students to use it mainly as a productivity tool: how to prompt, search, summarize, write, code, and use tools more efficiently. We argue that AI education also needs a set…
arXiv cs.AI TIER_1 English(EN) · Yun Ma · 2026-05-20 17:09

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

As AI becomes part of everyday learning, many courses teach students to use it mainly as a productivity tool: how to prompt, search, summarize, write, code, and use tools more efficiently. We argue that AI education also needs a setting in which students learn to test AI and unde…

报道来源 [2]

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

相关实体

相关话题