English(EN) A Large-Scale, Cross-Disciplinary Corpus of Systematic Reviews

新语料库和指标推动大语言模型在系统文献评价中的应用

作者 PulseAugur 编辑部 · [2 个来源] · 2026-04-28 04:00

两篇新研究论文探讨了大语言模型（LLMs）在系统评价领域的应用。第一篇论文介绍了一个大规模、跨学科的语料库，包含超过30万篇系统评价，旨在改进检索和筛选组件的基准测试。第二篇论文LLM4SCREENLIT提出了评估大语言模型在文献筛选中性能的建议，并提出了一种加权马修斯相关系数（WMCC），以更好地考虑该任务的失衡性质。 AI

影响用于大语言模型在系统评价中的新数据集和评估指标有望提高科学文献分析的效率和准确性。

排序理由该集群包含两篇在arXiv上发表的学术论文，详细介绍了用于大语言模型在系统评价中的新数据集和评估方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

报道来源 [2]

arXiv cs.CL TIER_1 English(EN) · Pierre Achkar, Tim Gollub, Arno Simons, Harrisen Scells, Martin Potthast · 2026-04-28 04:00

大规模、跨学科的系统性文献综述语料库

arXiv:2604.22864v1 Announce Type: cross Abstract: Existing benchmarks for systematic reviewing remain limited either in scale or in disciplinary coverage, with some collections comprising only a modest number of topics and others focusing primarily on biomedical research. We pres…
arXiv cs.LG TIER_1 English(EN) · Lech Madeyski, Barbara Kitchenham, Martin Shepperd · 2026-04-28 04:00

LLM4SCREENLIT：关于评估大型语言模型在系统评价文献筛选中性能的建议

arXiv:2511.12635v2 Announce Type: replace-cross Abstract: Context: Large language models (LLMs) are increasingly used to screen literature for systematic reviews (SRs), but the standard confusion-matrix metrics used to evaluate them can mislead under the imbalanced, cost-asymmetr…

报道来源 [2]

大规模、跨学科的系统性文献综述语料库

LLM4SCREENLIT：关于评估大型语言模型在系统评价文献筛选中性能的建议

相关实体

相关话题