Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 7h

Evaluating AI-based Scientific Knowledge Synthesis with Epidemiological Systematic Reviews

Researchers have developed AgentSLR, a new evaluation framework for AI models performing scientific knowledge synthesis, specifically focusing on epidemiological systematic literature reviews. The framework includes a dataset of over 16,000 articles and metrics for each stage of the review process. Testing five leading reasoning models revealed that none excelled across all tasks, with structured data extraction proving to be a significant challenge, as no model achieved an F1 score above 0.67 in this area. The findings indicate that current AI models are not yet reliable enough for unsupervised use in critical fields like epidemiology, where decisions can impact public policy. AI

IMPACT Highlights current limitations of AI in complex scientific synthesis, suggesting a need for further development before unsupervised deployment in policy-relevant domains.

AI
epidemiology
Shreyansh Padarha
AgentSLR