Traditional evaluations and red-teaming remain essential, especially for rare or severe risks.
OpenAI has developed a new method called Deployment Simulation to predict how AI models will behave in real-world scenarios before they are released. This technique uses de-identified user data to simulate deployment conditions, showing strong correlations with observed behaviors across various categories and GPT-5-series models. While traditional evaluations remain crucial, this simulation approach aims to estimate the frequency of undesired behaviors and identify new issues prior to deployment. AI
IMPACT This simulation method could improve AI safety by identifying potential issues before models are widely deployed.