Brief · PulseAugur

TOOL · Bluesky Jetstream — AI desk English(EN) · 4d

Seems GPT-5.2 reaches expert level in peer review: 45 scientists took 469 hours evaluating human & AI reviews on 82 papers.

A recent evaluation found that GPT-4.2, a version of OpenAI's language model, performs comparably to human experts in scientific peer review. In a study involving 45 scientists who spent 469 hours assessing 82 papers, the AI's reviews were found to be competitive with those from top-rated reviewers in a major scientific journal. However, the AI still exhibits weaknesses, suggesting a hybrid approach of AI and human collaboration is optimal for peer review. AI

IMPACT AI models are becoming competitive with human experts in complex tasks like scientific peer review, suggesting potential for increased efficiency and broader adoption.

OpenAI
Nature
Ethan Mollick
GPT-4.2