Two new research papers explore the application of large language models (LLMs) in the field of systematic reviews. The first paper introduces a large-scale, cross-disciplinary corpus of over 300,000 systematic reviews, designed to improve benchmarking for retrieval and screening components. The second paper, LLM4SCREENLIT, provides recommendations for evaluating LLM performance in literature screening, proposing a Weighted Matthews Correlation Coefficient (WMCC) to better account for the imbalanced nature of this task. AI
Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →
IMPACT New datasets and evaluation metrics for LLMs in systematic reviews could improve the efficiency and accuracy of scientific literature analysis.
RANK_REASON The cluster contains two academic papers published on arXiv, detailing new datasets and evaluation methodologies for LLMs in systematic reviews.