English(EN) Argument Quality Assessment with Large Language Models: A Pairwise Bradley-Terry Approach

大型语言模型在论证质量评估中与人类判断呈中度相关

作者 PulseAugur 编辑部 · [2 个来源] · 2026-05-27 11:14

研究人员探索了使用大型语言模型（LLMs）评估论证质量的方法，并比较了12个开源模型。研究发现，LLMs与人类专家的判断之间存在有希望但中度的相关性。Llama-70B 与专家的对齐度最高，达到了中度的 Cohen's \u03ba = 0.493。研究结果表明，LLMs能够部分但互补地理解论证质量维度，并且其预测在多次运行中保持稳定。 AI

影响大型语言模型在评估论证质量方面表现出中等能力，其中 Llama-70B 与人类专家的对齐度最高。

排序理由该集群包含一篇研究论文，详细介绍了使用大型语言模型进行论证质量评估的新方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Nicol\'as Benjam\'in Ocampo, Agnes Paullate Nyiranziza, Davide Ceolin · 2026-05-28 04:00

使用大型语言模型进行论证质量评估：一种成对 Bradley-Terry 方法

arXiv:2605.28313v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities in tasks related to reasoning and judgment. However, assessing the quality of arguments requires a rigorous evaluation. We investigate the extent to which LLMs c…
arXiv cs.CL TIER_1 English(EN) · Davide Ceolin · 2026-05-27 11:14

使用大型语言模型进行论证质量评估：一种成对 Bradley-Terry 方法

Large Language Models (LLMs) have demonstrated remarkable capabilities in tasks related to reasoning and judgment. However, assessing the quality of arguments requires a rigorous evaluation. We investigate the extent to which LLMs can effectively perform this task. We tested 12 o…

报道来源 [2]

使用大型语言模型进行论证质量评估：一种成对 Bradley-Terry 方法

使用大型语言模型进行论证质量评估：一种成对 Bradley-Terry 方法

相关实体

相关话题