Scalar-Stepsize Nonuniform Monte Carlo Optimistic Policy Iteration: A Certified Counterexample
A new paper presents a certified counterexample to the convergence of Monte Carlo optimistic policy iteration when using nonuniform update frequencies. The research demonstrates that fixed nonuniform state-selection probabilities can lead to a stochastic recursion that fails to converge, instead becoming trapped near a periodic orbit. This finding highlights a geometric obstruction where uniform sampling provides radial contraction, while nonuniform sampling can distort dynamics and create attracting cycles. AI
IMPACT Highlights theoretical limitations in reinforcement learning algorithms, potentially impacting future algorithm design.