PulseAugur / Brief
EN
LIVE 12:09:29

Brief

last 24h
[1/1] 224 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

  1. Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio

    Researchers have developed a new benchmark called MetaSyn to evaluate Large Language Model (LLM) agents on the complex task of meta-analysis. The benchmark consists of 442 expert-curated meta-analyses from Nature Portfolio journals, including detailed criteria, a large corpus of PubMed articles, and verified positive and negative studies. Initial testing revealed that current LLM agents struggle significantly with the study selection phase, failing to reliably identify eligible literature from topically similar but ineligible distractors, despite strong retrieval capabilities. AI

    IMPACT Highlights a critical bottleneck in LLM agent capabilities for scientific reasoning, particularly in complex information synthesis tasks.