Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 8h

Exploring Starts Are Not Enough: Counterexamples and a Fix for Monte Carlo Exploring Starts

A new paper on arXiv introduces counterexamples to the convergence properties of Monte Carlo Exploring Starts (MCES) in reinforcement learning, demonstrating that it can converge to suboptimal solutions. The research highlights issues with both initial-visit and first-visit MCES, particularly concerning sample-average updates and the balance between exploration and exploitation. A proposed modification, which scales learning rates inversely to update frequencies on a state-by-state basis, is shown to guarantee convergence to optimality and is applicable to large-scale problems. AI

IMPACT Highlights critical dependencies between learning rates and update frequencies for convergence in reinforcement learning algorithms.

reinforcement learning
exploration
Monte Carlo Exploring Starts
initial-visit MCES
first-visit MCES
sample-average updates
greedy actions
non-greedy actions
Monte Carlo control