New corpus and metrics advance LLM use in systematic literature reviews

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

Two new research papers explore the application of large language models (LLMs) in the field of systematic reviews. The first paper introduces a large-scale, cross-disciplinary corpus of over 300,000 systematic reviews, designed to improve benchmarking for retrieval and screening components. The second paper, LLM4SCREENLIT, provides recommendations for evaluating LLM performance in literature screening, proposing a Weighted Matthews Correlation Coefficient (WMCC) to better account for the imbalanced nature of this task. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

IMPACT New datasets and evaluation metrics for LLMs in systematic reviews could improve the efficiency and accuracy of scientific literature analysis.

RANK_REASON The cluster contains two academic papers published on arXiv, detailing new datasets and evaluation methodologies for LLMs in systematic reviews.

Read on arXiv cs.CL →

paper
other

COVERAGE [2]

arXiv cs.CL TIER_1 · Pierre Achkar, Tim Gollub, Arno Simons, Harrisen Scells, Martin Potthast · 2026-04-28 04:00

A Large-Scale, Cross-Disciplinary Corpus of Systematic Reviews

arXiv:2604.22864v1 Announce Type: cross Abstract: Existing benchmarks for systematic reviewing remain limited either in scale or in disciplinary coverage, with some collections comprising only a modest number of topics and others focusing primarily on biomedical research. We pres…
arXiv cs.LG TIER_1 · Lech Madeyski, Barbara Kitchenham, Martin Shepperd · 2026-04-28 04:00

LLM4SCREENLIT: Recommendations on Assessing the Performance of Large Language Models for Screening Literature in Systematic Reviews

arXiv:2511.12635v2 Announce Type: replace-cross Abstract: Context: Large language models (LLMs) are increasingly used to screen literature for systematic reviews (SRs), but the standard confusion-matrix metrics used to evaluate them can mislead under the imbalanced, cost-asymmetr…

COVERAGE [2]

A Large-Scale, Cross-Disciplinary Corpus of Systematic Reviews

LLM4SCREENLIT: Recommendations on Assessing the Performance of Large Language Models for Screening Literature in Systematic Reviews

RELATED ENTITIES

RELATED TOPICS