PulseAugur
实时 18:07:29
English(EN) Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

新的 QuestBench 基准揭示了人工智能在人文学科中的失败

研究人员开发了 QuestBench,这是一个旨在通过让学生构建和评估人工智能系统来教授他们关于人工智能的新基准。这种方法鼓励学生定义什么构成值得信赖的答案,而不仅仅是将人工智能用作生产力工具。该基准包含 14 个人文和社会科学领域的 256 个问题,揭示了当前人工智能系统的重大缺陷,表现最好的 GPT-5.5 的通过率仅为 57.58%。 AI

影响 强调了当前人工智能在复杂知识领域的局限性,并强调了改进评估方法的必要性。

排序理由 该集群描述了一篇介绍用于评估人工智能系统的新颖基准的学术论文。

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

报道来源 [2]

  1. arXiv cs.AI TIER_1 English(EN) · Haiyang Shen, Jiuzheng Wang, Taian Guo, Mugeng Liu, Wenchun Jing, Chongyang Pan, Siqi Zhong, Zhiyang Chen, Weichen Bi, Yudong Han, Xiaoying Bai, Yun Ma ·

    Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

    arXiv:2605.21413v2 Announce Type: new Abstract: As AI becomes part of everyday learning, many courses teach students to use it mainly as a productivity tool: how to prompt, search, summarize, write, code, and use tools more efficiently. We argue that AI education also needs a set…

  2. arXiv cs.AI TIER_1 English(EN) · Yun Ma ·

    Teaching AI Through Benchmark Construction: QuestBench as a Course-Based Practice for Accountable Knowledge Work

    As AI becomes part of everyday learning, many courses teach students to use it mainly as a productivity tool: how to prompt, search, summarize, write, code, and use tools more efficiently. We argue that AI education also needs a setting in which students learn to test AI and unde…