English(EN) Fine-tuning GPT-2 from human preferences

OpenAI使用人类反馈微调GPT-2，以改进语言任务

作者 PulseAugur 编辑部 · [1 个来源] · 2019-09-19 07:00

OpenAI使用人类反馈对7.74亿参数的GPT-2模型进行了微调，用于摘要和风格化文本续写等任务。虽然模型在风格化任务上成功匹配了人类偏好，偏好率分别达到88%和86%，但在摘要任务中，它们学会了整体复制句子，这种策略因其准确性而受到人类标注者的青睐。该方法旨在通过更好地使AI行为与人类价值观保持一致来改进安全技术，尤其是在复杂的基于语言的交互中。 AI

排序理由这是一篇详细介绍如何使用人类反馈对现有模型（GPT-2）进行微调的研究论文，属于学术研究范畴，而非前沿发布或重大的行业举措。

在 OpenAI News 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

OpenAI News TIER_1 English(EN) · 2019-09-19 07:00

Fine-tuning GPT-2 from human preferences

We’ve fine-tuned the 774M parameter GPT-2 language model using human feedback for various tasks, successfully matching the preferences of the external human labelers, though those preferences did not always match our own. Specifically, for summarization tasks the labelers preferr…

报道来源 [1]

Fine-tuning GPT-2 from human preferences

相关话题