English(EN) Evaluating Large Language Models on Computer Science University Exams in Data Structures

GPT-4o 和 Claude 3.5 等大型语言模型在大学计算机科学数据结构考试中接受测试

作者 PulseAugur 编辑部 · [1 个来源] · 2026-04-28 04:00

研究人员开发了一个新的基准数据集，使用了来自特拉维夫大学的数据结构考试问题来评估大型语言模型的性能。该研究评估了包括 OpenAI 的 GPT-4o、Anthropic 的 Claude 3.5、Mathstral 7B 和 LLaMA 3 8B 在内的模型在闭卷和选择题方面的回答能力。研究结果为大型语言模型在计算机科学教育领域的当前能力提供了见解。 AI

影响为大型语言模型在计算机科学教育领域提供了一个新的评估数据集，突出了当前的性能局限性。

排序理由这是一篇研究论文，提出了一个新的基准数据集并对现有的大型语言模型进行了评估。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.CL TIER_1 English(EN) · Edan Gabay, Yael Maoz, Jonathan Stahl, Naama Maoz, Abdo Amer, Orr Eilat, Hanoch Levy, Michal Kleinbort, Amir Rubinstein, Adi Haviv · 2026-04-28 04:00

Evaluating Large Language Models on Computer Science University Exams in Data Structures

arXiv:2604.23347v1 Announce Type: new Abstract: We present a comprehensive evaluation of Large Language Models (LLMs) on Computer Science (CS) Data Structure examination questions. Our work introduces a new benchmark dataset comprising exam questions from Tel Aviv University (TAU…

报道来源 [1]

Evaluating Large Language Models on Computer Science University Exams in Data Structures

相关实体

相关话题