Brief · PulseAugur

RESEARCH · arXiv cs.IR (Information Retrieval) English(EN) · 18h · [2 sources]

Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

Researchers have developed a new benchmark called MetaSyn to evaluate Large Language Model (LLM) agents on the complex task of meta-analysis. The benchmark consists of 442 expert-curated meta-analyses from Nature Portfolio journals, including detailed criteria, a large corpus of PubMed articles, and verified positive and negative studies. Initial testing revealed that current LLM agents struggle significantly with the study selection phase, failing to reliably identify eligible literature from topically similar but ineligible distractors, despite strong retrieval capabilities. AI

IMPACT Highlights a critical bottleneck in LLM agent capabilities for scientific reasoning, particularly in complex information synthesis tasks.

Hugging Face
LLM Agents
PubMed
Nature Portfolio
CatalyzeX
Connected Papers
Litmaps
scite Smart Citations
Gotit.pub
DagsHub
alphaXiv
ScienceCast