Brief · PulseAugur

RESEARCH · OpenAI News · 121mo · [321 sources]

RL²: Fast reinforcement learning via slow reinforcement learning

OpenAI has published a series of research papers detailing advancements in reinforcement learning (RL). These include achieving superhuman performance in Dota 2 with OpenAI Five, developing benchmarks for safe exploration in RL environments, and quantifying generalization capabilities with a new CoinRun environment. The research also explores novel methods for encouraging exploration through curiosity, learning policy representations in multiagent systems, and evolving loss functions for faster training on new tasks. Additionally, OpenAI is working on variance reduction techniques for policy gradients and exploring the equivalence between policy gradients and soft Q-learning. AI

IMPACT These advancements in reinforcement learning, including new benchmarks and methods for generalization and exploration, could accelerate the development of more capable and safer AI systems.

OpenAI
OpenAI Five
Dota 2
CoinRun
Safety Gym
Random Network Distillation
Evolved Policy Gradients
Proximal Policy Optimization
IMPALA
A3C
Ilya Sutskever
Pieter Abbeel
Wojciech Zaremba
Matthias Plappert
Monte de Montezuma's Revenge
Evolved Policy Gradients (EPG)
Proximal Policy Optimization (PPO)
Random Network Distillation (RND)