Provably Convergent Actor-Critic for MARL through Risk-aversion
Researchers have developed a new actor-critic algorithm for multi-agent reinforcement learning (MARL) that addresses the challenge of learning stationary policies in general-sum Markov games. The algorithm leverages the concept of Risk-averse Quantal response Equilibria (RQE), which incorporates risk aversion and bounded rationality, to ensure convergence. Theoretical guarantees and empirical validation demonstrate its superior performance compared to risk-neutral methods. AI
IMPACT Introduces a novel theoretical framework and algorithm for improving multi-agent reinforcement learning convergence, potentially impacting complex coordination tasks.