AI safety researchers define three types of model organisms

By PulseAugur Editorial · [1 sources] · 2026-06-10 08:50

Researchers have proposed a framework to categorize model organisms (MOs) used in AI safety research into three distinct types. Worst-case MOs serve as stress tests for safety mechanisms by simulating extreme failure scenarios. Natural MOs mimic realistic failure modes that can arise during actual AI training processes. Constructed MOs are deliberately engineered to exhibit specific, often unnatural, behaviors to study potential future AI capabilities and risks. AI

IMPACT Provides a structured way to think about and test AI safety mechanisms against potential future risks.

RANK_REASON The cluster describes a conceptual framework for AI safety research, presented in a blog post. [lever_c_demoted from research: ic=1 ai=1.0]

Read on LessWrong (AI tag) →

safety
paper

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

LessWrong (AI tag) TIER_1 English(EN) · Francis Rhys Ward · 2026-06-10 08:50

Three types of model organism

This is a short post to explain a distinction between three different types of model organism (MO) research:<table class="editor-table"><tbody><tr><td class="table-cell">Type</td><td class="table-cell">Purpose<…

COVERAGE [1]

Three types of model organism

RELATED ENTITIES

RELATED TOPICS