OpenAI releases open-weight reasoning models for safety classification

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

OpenAI has released gpt-oss-safeguard, a set of open-weight reasoning models designed for safety classification tasks. Available in 120B and 20B parameter sizes, these models are fine-tuned versions of existing gpt-oss models and are released under the Apache 2.0 license. The key innovation is their ability to interpret and apply developer-provided policies at inference time, offering a more flexible and explainable approach to content moderation compared to traditional methods. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

RANK_REASON OpenAI released open-weight models for safety classification tasks, accompanied by a technical report, fitting the research bucket.

Read on OpenAI News →

OpenAI releases open-weight reasoning models for safety classification

COVERAGE [2]

OpenAI News TIER_1 · 2025-10-29 00:00

gpt-oss-safeguard technical report

gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are two open-weight reasoning models post-trained from the gpt-oss models and trained to reason from a provided policy in order to label content under that policy. In this report, we describe gpt-oss-safeguard’s capabilities and pr…
OpenAI News TIER_1 · 2025-10-29 00:00

Introducing gpt-oss-safeguard

OpenAI introduces gpt-oss-safeguard—open-weight reasoning models for safety classification that let developers apply and iterate on custom policies.

COVERAGE [2]

gpt-oss-safeguard technical report

Introducing gpt-oss-safeguard

RELATED TOPICS