LLM detectors unreliable for academic peer review, study finds

By PulseAugur Editorial · [1 sources] · 2026-06-24 04:00

A new study published on arXiv has found that current methods for detecting Large Language Model (LLM) use in academic peer reviews are not reliable. Researchers evaluated five state-of-the-art detectors, including commercial systems, and found that they frequently misclassify human-AI collaborative reviews as fully AI-generated. This could lead to false accusations of academic misconduct and overstate the extent of policy violations regarding LLM use in peer review. AI

IMPACT Current LLM detection tools are not reliable for academic peer reviews, potentially leading to false accusations and misinterpretations of AI usage.

RANK_REASON Research paper published on arXiv detailing findings about LLM detection in academic peer reviews. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
safety

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

LLM detectors unreliable for academic peer review, study finds

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Rounak Saha, Gurusha Juneja, Dayita Chaudhuri, Naveeja Sajeevan, Nihar B Shah, Danish Pruthi · 2026-06-24 04:00

Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable

arXiv:2603.20450v2 Announce Type: replace-cross Abstract: A number of scientific conferences and journals have recently enacted policies that prohibit LLM usage by peer reviewers, except for polishing, paraphrasing, and grammar correction of otherwise human-written reviews. But, …

COVERAGE [1]

Policies Permitting LLM Use for Polishing Peer Reviews Are Currently Not Enforceable

RELATED ENTITIES

RELATED TOPICS