English(EN) Gathering human feedback

OpenAI发布RL-Teacher，用于通过人类反馈进行AI训练

作者 PulseAugur 编辑部 · [1 个来源] · 2017-08-03 07:00

OpenAI发布了RL-Teacher，这是一个开源工具，旨在利用人类反馈而非预定义的奖励函数来训练AI模型。这种方法在考虑AI安全性的前提下开发，包含一个学习人类偏好的奖励预测器，可以集成到各种AI代理中。该系统包括一个供人类提供反馈的Web应用程序，然后用于训练预测器，并且是用不到1000行Python代码实现的。 AI

排序理由用于通过人类反馈训练AI模型的工具的开源发布。

在 OpenAI News 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

OpenAI News TIER_1 English(EN) · 2017-08-03 07:00

Gathering human feedback

RL-Teacher is an open-source implementation of our interface to train AIs via occasional human feedback rather than hand-crafted reward functions. The underlying technique was developed as a step towards safe AI systems, but also applies to reinforcement learning problems with re…

报道来源 [1]

Gathering human feedback

相关话题