English(EN) On the Variance of Temporal Difference Learning and its Reduction Using Control Variates

研究论文分析时序差分学习方差

作者 PulseAugur 编辑部 · [1 个来源] · 2026-06-18 15:20

一篇新的研究论文分析了时序差分（TD）学习中的方差，这是一种用于强化学习的方法。研究表明，TD学习通过聚合来自多个轨迹的信息来减少方差，并且对于给定的样本数量，较短的更新时间范围会导致较低的方差。该论文还提出了直接优势估计（DAE）作为一种回归调整的控制变量，在样本量大的情况下，其方差界限比TD更紧密。 AI

影响这项研究可能通过改进方差减少技术，从而实现更稳定、更高效的强化学习代理。

排序理由关于强化学习算法的学术论文。[lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.LG 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.LG TIER_1 English(EN) · Bernhard Schölkopf · 2026-06-18 15:20

关于时序差分学习的方差及其使用控制变量的减小

We analyze the variance of temporal difference (TD) learning using the phased setting with tabular representation, and show that one of the mechanisms behind its ability to reduce variance is by effectively aggregating over a larger number of independent trajectories. Based on th…

报道来源 [1]

关于时序差分学习的方差及其使用控制变量的减小

相关话题