Analysis of AlphaZero training data [D]
A user is training an AlphaZero model for Othello on a 6x6 board and encountering issues with performance. Despite models improving against each other, they are not significantly better than benchmark agents, with a win rate below 10% against a greedy agent. The user has analyzed training data, including value loss, prediction entropy, and policy divergence, and is seeking advice on hyperparameter tuning to resolve the model's poor performance. AI
IMPACT User seeks to improve training methodology for reinforcement learning agents.