English(EN) Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

LLM 用于 Linux/bash 考试批改的评估，Gemini 3.0 Pro 领先

作者 PulseAugur 编辑部 · [2 个来源] · 2026-07-02 17:01

一项发表在 arXiv 上的新研究探讨了使用大型语言模型 (LLM) 来批改 Linux/bash 考试。研究人员使用四级认知分类法，将 GPT、Claude Opus、Gemini 和 GLM 四种前沿 LLM 与专家判断进行了比较。结果显示，在经过增强型提示词指导的评分标准下，Gemini 3.0 Pro 与人类评分员的一致性最高，但随着问题复杂度的增加，准确性有所下降。 AI

影响 LLM 在自动化技术科目批改方面显示出潜力，其准确性取决于问题的复杂性和提示词的质量。

排序理由该集群包含一篇详细评估 LLM 在特定任务中应用的论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

LLM 用于 Linux/bash 考试批改的评估，Gemini 3.0 Pro 领先

报道来源 [2]

arXiv cs.AI TIER_1 English(EN) · Manuel Alonso-Carracedo, Ruben Fernandez-Boullon, Pedro Celard, Francisco J. Rodriguez-Martinez, Lorena Otero-Cerdeira · 2026-07-03 04:00

Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

arXiv:2607.02432v1 Announce Type: new Abstract: Scalable and reliable grading of command-line examinations remains a challenge in computing education, where rising enrolments make manual marking difficult and rule-based autograders cannot handle partial credit, equivalent solutio…
arXiv cs.AI TIER_1 English(EN) · Lorena Otero-Cerdeira · 2026-07-02 17:01

Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

Scalable and reliable grading of command-line examinations remains a challenge in computing education, where rising enrolments make manual marking difficult and rule-based autograders cannot handle partial credit, equivalent solutions, or syntactic variation. This paper evaluates…

报道来源 [2]

Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

Automated grading of Linux/bash examinations using large language models: a four-level cognitive taxonomy approach

相关实体

相关话题