RL²: Fast reinforcement learning via slow reinforcement learning
OpenAI has published a series of research papers detailing advancements in reinforcement learning. These include achieving superhuman performance in Dota 2 with OpenAI Five, developing benchmarks for safe exploration in RL, and quantifying generalization capabilities with the CoinRun environment. The company also explored novel methods like prediction-based rewards for curiosity-driven exploration, learning policy representations in multiagent systems, and an experimental metalearning approach called Evolved Policy Gradients for faster training on new tasks. Further research addresses variance reduction in policy gradients and the equivalence between policy gradients and soft Q-learning, alongside challenging robotics environments for multi-goal RL. AI
IMPACT Demonstrates significant progress in RL capabilities, including superhuman performance, safety, generalization, and exploration, pushing the boundaries of AI.
- OpenAI
- OpenAI Five
- Dota 2
- CoinRun
- Safety Gym
- Random Network Distillation
- Evolved Policy Gradients
- Proximal Policy Optimization
- IMPALA
- A3C
- Ilya Sutskever
- Pieter Abbeel
- Wojciech Zaremba
- Matthias Plappert
- Monte de Montezuma's Revenge
- Evolved Policy Gradients (EPG)
- Proximal Policy Optimization (PPO)
- Random Network Distillation (RND)
- Monte deZuma's Revenge