Two new research papers explore the application of large language models (LLMs) in the field of systematic reviews. The first paper introduces a large-scale, cross-disciplinary corpus of over 300,000 systematic reviews, designed to improve benchmarking for retrieval and screening components. The second paper, LLM4SCREENLIT, provides recommendations for evaluating LLM performance in literature screening, proposing a Weighted Matthews Correlation Coefficient (WMCC) to better account for the imbalanced nature of this task. AI
IMPACT New datasets and evaluation metrics for LLMs in systematic reviews could improve the efficiency and accuracy of scientific literature analysis.
RANK_REASON The cluster contains two academic papers published on arXiv, detailing new datasets and evaluation methodologies for LLMs in systematic reviews.
AI-generated summary · Google Gemini · from 2 sources. How we write summaries →