New TD(0) algorithm achieves simultaneous robust and fast convergence rates

By PulseAugur Editorial · [2 sources] · 2026-06-23 13:37

Researchers have developed a new approach for linear TD(0) algorithms that utilizes Polyak--Ruppert averaging. This method achieves both robust, curvature-free convergence rates and fast, curvature-dependent rates simultaneously. The technique relies on a novel toolkit for analyzing geometrically mixing Markov chains, which decomposes Markov noise into a martingale term and a controlled remainder, enabling a new self-bounding inductive argument for pathwise stability. AI

IMPACT This research could lead to more efficient and stable reinforcement learning algorithms.

RANK_REASON The cluster contains a research paper detailing a new algorithm and theoretical analysis. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv stat.ML →

paper
other

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

New TD(0) algorithm achieves simultaneous robust and fast convergence rates

COVERAGE [2]

arXiv stat.ML TIER_1 English(EN) · Wei-Cheng Lee, Francesco Orabona · 2026-06-25 04:00

A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak--Ruppert Averaging

arXiv:2606.24981v1 Announce Type: cross Abstract: We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain unprojected TD(0) algorithm with Polyak-Ruppert (PR) averaging, using a single …
arXiv stat.ML TIER_1 English(EN) · Francesco Orabona · 2026-06-23 13:37

A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak--Ruppert Averaging

We study linear TD(0) under Markovian sampling, where data are generated along a single trajectory. We provide high-probability guarantees for a plain unprojected TD(0) algorithm with Polyak-Ruppert (PR) averaging, using a single stepsize schedule $η_t \propto \frac{1}{τ_{\mathrm…

COVERAGE [2]

A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak--Ruppert Averaging

A Single Stepsize Suffices for Unprojected Linear TD(0): Simultaneous Robust and Fast Rates via Polyak--Ruppert Averaging

RELATED ENTITIES

RELATED TOPICS