RL²: Fast reinforcement learning via slow reinforcement learning
OpenAI has published a series of research papers detailing advancements in reinforcement learning (RL). These include achieving superhuman performance in Dota 2 with OpenAI Five, developing benchmarks for safe exploration in RL environments, and quantifying generalization capabilities with a new CoinRun environment. The research also explores novel methods for encouraging exploration through curiosity, learning policy representations in multiagent systems, and evolving loss functions for faster training on new tasks. Additionally, OpenAI is working on variance reduction techniques for policy gradients and exploring the equivalence between policy gradients and soft Q-learning. AI
IMPACT These advancements in reinforcement learning, including new benchmarks and methods for generalization and exploration, could accelerate the development of more capable and safer AI systems.
- OpenAI
- OpenAI Five
- Dota 2
- CoinRun
- Safety Gym
- Random Network Distillation
- Evolved Policy Gradients
- Proximal Policy Optimization
- IMPALA
- A3C
- Ilya Sutskever
- Pieter Abbeel
- Wojciech Zaremba
- Matthias Plappert
- Monte de Montezuma's Revenge
- Evolved Policy Gradients (EPG)
- Proximal Policy Optimization (PPO)
- Random Network Distillation (RND)