OpenAI has developed a new safety mechanism called Rule-Based Rewards (RBRs) that reduces the need for extensive human feedback in training AI models. This method uses predefined rules to evaluate model responses for safety, complementing traditional reinforcement learning techniques. RBRs have been integrated into OpenAI's safety stack since the GPT-4 launch and are planned for future models to ensure helpfulness while preventing harmful outputs. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
RANK_REASON The item describes a new research method for improving AI safety, not a new model release or a significant policy change.