A new paper presents a certified counterexample to the convergence of Monte Carlo optimistic policy iteration when using nonuniform update frequencies. The research demonstrates that fixed nonuniform state-selection probabilities can lead to a stochastic recursion that fails to converge, instead becoming trapped near a periodic orbit. This finding highlights a geometric obstruction where uniform sampling provides radial contraction, while nonuniform sampling can distort dynamics and create attracting cycles. AI
影响 Highlights theoretical limitations in reinforcement learning algorithms, potentially impacting future algorithm design.
排序理由 Academic paper published on arXiv detailing a theoretical counterexample in reinforcement learning. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →