Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 8h

Finite-Time Convergence of Distributionally Robust Q-Learning with Linear Function Approximation

Researchers have developed a new algorithm for Distributionally Robust Reinforcement Learning (DRRL) that provides finite-time convergence guarantees even with linear function approximation. This algorithm addresses limitations in existing DRRL methods, which often require tabular settings or specific structural assumptions. The new approach combines a target-network with a dual function-approximation scheme, utilizing moment-tracking critics and suffix averaging to achieve convergence to the optimal robust Q-function. AI

IMPACT Provides theoretical guarantees for robust reinforcement learning, potentially improving agent performance in uncertain environments.

arXiv
reinforcement learning
Markov Chains
Q-function
Saptarshi Mandal
Distributionally Robust Q-Learning
Bellman update
Lipschitz function