OpenAI has developed a new evaluation method to assess the risk of large language models aiding in the creation of biological threats. Their initial study, involving biology experts and students, found that GPT-4 provided only a mild, statistically insignificant uplift in accuracy for threat creation tasks compared to internet-only access. This research is part of OpenAI's broader Preparedness Framework and aims to contribute to community understanding and the development of safety evaluations for AI-enabled risks. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON This is a research paper detailing a new evaluation method for AI safety risks, not a frontier model release or significant policy change.