English(EN) Learning Montezuma’s Revenge from a single demonstration

从单次演示中学习《蒙特祖玛的复仇》

作者 PulseAugur 编辑部 · [1 个来源] · 2018-07-04 07:00

OpenAI开发了一个强化学习智能体，在仅观看一次人类演示后，就能在《蒙特祖玛的复仇》游戏中取得高分。该智能体采用了一种新颖的方法，从演示中的状态开始每个学习回合，显著减少了传统强化学习中固有的探索问题。这种方法使智能体能够专注于学习最优动作序列，而不是随机探索，从而取得了超越以往基准的性能。 AI

排序理由该集群描述了一篇研究论文，详细介绍了OpenAI为游戏开发的一种新的强化学习技术。

在 OpenAI News 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

OpenAI News TIER_1 English(EN) · 2018-07-04 07:00

Learning Montezuma’s Revenge from a single demonstration

We’ve trained an agent to achieve a high score of 74,500 on Montezuma’s Revenge from a single human demonstration, better than any previously published result. Our algorithm is simple: the agent plays a sequence of games starting from carefully chosen states from the demonstratio…

报道来源 [1]

Learning Montezuma’s Revenge from a single demonstration

相关话题