Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 15h

On the Variance of Temporal Difference Learning and its Reduction Using Control Variates

A new research paper analyzes the variance in temporal difference (TD) learning, a method used in reinforcement learning. The study demonstrates that TD learning reduces variance by aggregating information from multiple trajectories and that shorter update horizons lead to less variance for a given number of samples. The paper also presents Direct Advantage Estimation (DAE) as a regression-adjusted control variate that offers a tighter variance bound than TD in scenarios with many samples. AI

IMPACT This research could lead to more stable and efficient reinforcement learning agents by improving variance reduction techniques.

Temporal Difference (TD) learning
Direct Advantage Estimation (DAE)
Monte Carlo (MC) estimators