A new study published on arXiv has found that current methods for detecting Large Language Model (LLM) use in academic peer reviews are not reliable. Researchers evaluated five state-of-the-art detectors, including commercial systems, and found that they frequently misclassify human-AI collaborative reviews as fully AI-generated. This could lead to false accusations of academic misconduct and overstate the extent of policy violations regarding LLM use in peer review. AI
IMPACT Current LLM detection tools are not reliable for academic peer reviews, potentially leading to false accusations and misinterpretations of AI usage.
RANK_REASON Research paper published on arXiv detailing findings about LLM detection in academic peer reviews. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →