PulseAugur
EN
LIVE 18:40:39

GPT-4.2 matches expert human performance in scientific peer review

A recent evaluation found that GPT-4.2, a version of OpenAI's language model, performs comparably to human experts in scientific peer review. In a study involving 45 scientists who spent 469 hours assessing 82 papers, the AI's reviews were found to be competitive with those from top-rated reviewers in a major scientific journal. However, the AI still exhibits weaknesses, suggesting a hybrid approach of AI and human collaboration is optimal for peer review. AI

IMPACT AI models are becoming competitive with human experts in complex tasks like scientific peer review, suggesting potential for increased efficiency and broader adoption.

RANK_REASON The cluster describes a research evaluation of an AI model's capabilities in a specific domain (scientific peer review), not a new model release or a significant industry-wide event. [lever_c_demoted from research: ic=1 ai=1.0]

Read on Bluesky Jetstream — AI desk →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. Bluesky Jetstream — AI desk TIER_1 English(EN) · emollick.bsky.social ·

    Seems GPT-5.2 reaches expert level in peer review: 45 scientists took 469 hours evaluating human & AI reviews on 82 papers.

    Seems GPT-5.2 reaches expert level in peer review: 45 scientists took 469 hours evaluating human & AI reviews on 82 papers. "Surprisingly, current AI reviewers are competitive even with the top-rated reviewers in Nature’s official peer review..." though not without weaknesses, s…