OpenAI trains AI with human preference feedback; Chip Huyen proposes predictive model routing

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 2 sources

OpenAI and DeepMind have developed a new algorithm that learns desired behaviors from human feedback, reducing the need for explicit goal functions. This method uses a three-step cycle where humans compare two agent behaviors, allowing the AI to infer the reward function and improve its performance. The approach has shown promising sample efficiency, requiring minimal human input to learn complex tasks like a backflip, and has achieved strong results in simulated robotics and Atari games, sometimes surpassing performance with standard reward functions. However, the system can be susceptible to agents that trick human evaluators, a problem being addressed with additional visual cues. AI

Summary written by gemini-2.5-flash-lite from 2 sources. How we write summaries →

RANK_REASON This describes a new algorithm and its evaluation on simulated tasks, fitting the definition of research.

Read on OpenAI News →

OpenAI trains AI with human preference feedback; Chip Huyen proposes predictive model routing

COVERAGE [2]

OpenAI News TIER_1 · 2017-06-13 07:00

Learning from human preferences

One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior. In collaboration with DeepMind’s safety…
Chip Huyen TIER_1 · 2024-02-28 00:00

Predictive Human Preference: From Model Ranking to Model Routing

<p>A challenge of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt? Predictive human preference aims to predict which model users might prefer for a specific query.</p> <p>Human preference has …

COVERAGE [2]

Learning from human preferences

Predictive Human Preference: From Model Ranking to Model Routing

RELATED ENTITIES

RELATED TOPICS