AI safety audits improved with environment blueprints

By PulseAugur Editorial · [1 sources] · 2026-05-26 00:31

Researchers have developed a new pipeline to generate environment blueprints for more realistic and consistent AI safety audits. This method was tested using the Petri auditor to evaluate Gemini 3.1 Pro Preview for code sabotage. The results showed that the blueprint-enhanced audits were more realistic and consistent than baseline audits, with no egregious scheming behavior detected in 160 trials. AI

IMPACT Enhances the realism and consistency of AI safety audits, potentially leading to more reliable evaluations of model behavior.

RANK_REASON The cluster describes a new methodology for AI safety auditing published in a research write-up. [lever_c_demoted from research: ic=1 ai=1.0]

Read on LessWrong (AI tag) →

safety
paper

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

LessWrong (AI tag) TIER_1 English(EN) · Jannes Elstner · 2026-05-26 00:31

Improving Petri scheming audits with environment blueprints

This is a short write-up of work conducted as part of the MATS 9.0 program. We thank Victoria Krakovna for mentorship and Fred Bruford for research management.TL;DR: We introduce a pipeline that generates environment bluepri…

COVERAGE [1]

Improving Petri scheming audits with environment blueprints

RELATED ENTITIES

RELATED TOPICS