PulseAugur
EN
LIVE 03:10:12

New TD(0) algorithm achieves simultaneous robust and fast convergence rates

Researchers have developed a new approach for linear TD(0) algorithms that utilizes Polyak--Ruppert averaging. This method achieves both robust, curvature-free convergence rates and fast, curvature-dependent rates simultaneously. The technique relies on a novel toolkit for analyzing geometrically mixing Markov chains, which decomposes Markov noise into a martingale term and a controlled remainder, enabling a new self-bounding inductive argument for pathwise stability. AI

IMPACT This research could lead to more efficient and stable reinforcement learning algorithms.

RANK_REASON The cluster contains a research paper detailing a new algorithm and theoretical analysis. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv stat.ML →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New TD(0) algorithm achieves simultaneous robust and fast convergence rates

COVERAGE [2]

  1. arXiv stat.ML TIER_1 English(EN) · Wei-Cheng Lee, Francesco Orabona ·

    A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak--Ruppert Averaging

    arXiv:2606.24981v1 Announce Type: cross Abstract: We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain unprojected TD(0) algorithm with Polyak-Ruppert (PR) averaging, using a single …

  2. arXiv stat.ML TIER_1 English(EN) · Francesco Orabona ·

    A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak--Ruppert Averaging

    We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain unprojected TD(0) algorithm with Polyak-Ruppert (PR) averaging, using a single stepsize schedule $η_t \propto \frac{1}{τ_{\mathrm…