English(EN) Learning from human preferences

OpenAI 利用人类偏好反馈训练 AI；Chip Huyen 提出预测模型路由

作者 PulseAugur 编辑部 · [2 个来源] · 2017-06-13 07:00

OpenAI 和 DeepMind 开发了一种新算法，可以从人类反馈中学习期望的行为，从而减少对显式目标函数的需求。该方法使用一个三步循环，人类比较两种代理行为，使 AI 能够推断奖励函数并提高其性能。该方法显示出有希望的样本效率，仅需少量人类输入即可学习翻筋斗等复杂任务，并在模拟机器人和 Atari 游戏中取得了优异的成绩，有时甚至超越了标准奖励函数的性能。然而，该系统容易受到欺骗人类评估者的代理的影响，目前正通过额外的视觉线索来解决这个问题。 AI

排序理由这描述了一种新算法及其在模拟任务上的评估，符合研究的定义。

在 OpenAI News 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。我们如何撰写摘要 →

OpenAI 利用人类偏好反馈训练 AI；Chip Huyen 提出预测模型路由

报道来源 [2]

OpenAI News TIER_1 English(EN) · 2017-06-13 07:00

从人类偏好中学习

One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior. In collaboration with DeepMind’s safety…
Chip Huyen TIER_1 English(EN) · 2024-02-28 00:00

预测性人类偏好：从模型排名到模型路由

<p>A challenge of building AI applications is choosing which model to use. What if we don’t have to? What if we can predict the best model for any prompt? Predictive human preference aims to predict which model users might prefer for a specific query.</p> <p>Human preference has …

报道来源 [2]

从人类偏好中学习

预测性人类偏好：从模型排名到模型路由

相关实体

相关话题