PulseAugur
实时 23:00:19
English(EN) A Large-Scale, Cross-Disciplinary Corpus of Systematic Reviews

新语料库和指标推动大语言模型在系统文献评价中的应用

两篇新研究论文探讨了大语言模型(LLMs)在系统评价领域的应用。第一篇论文介绍了一个大规模、跨学科的语料库,包含超过30万篇系统评价,旨在改进检索和筛选组件的基准测试。第二篇论文LLM4SCREENLIT提出了评估大语言模型在文献筛选中性能的建议,并提出了一种加权马修斯相关系数(WMCC),以更好地考虑该任务的失衡性质。 AI

影响 用于大语言模型在系统评价中的新数据集和评估指标有望提高科学文献分析的效率和准确性。

排序理由 该集群包含两篇在arXiv上发表的学术论文,详细介绍了用于大语言模型在系统评价中的新数据集和评估方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

新语料库和指标推动大语言模型在系统文献评价中的应用

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Pierre Achkar, Tim Gollub, Arno Simons, Harrisen Scells, Martin Potthast ·

    A Large-Scale, Cross-Disciplinary Corpus of Systematic Reviews

    arXiv:2604.22864v1 Announce Type: cross Abstract: Existing benchmarks for systematic reviewing remain limited either in scale or in disciplinary coverage, with some collections comprising only a modest number of topics and others focusing primarily on biomedical research. We pres…

  2. arXiv cs.LG TIER_1 English(EN) · Lech Madeyski, Barbara Kitchenham, Martin Shepperd ·

    LLM4SCREENLIT: Recommendations on Assessing the Performance of Large Language Models for Screening Literature in Systematic Reviews

    arXiv:2511.12635v2 Announce Type: replace-cross Abstract: Context: Large language models (LLMs) are increasingly used to screen literature for systematic reviews (SRs), but the standard confusion-matrix metrics used to evaluate them can mislead under the imbalanced, cost-asymmetr…