Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 8h

Q-Learning with Fine-Grained Gap-Dependent Regret

Researchers have developed new algorithms for Q-learning that provide more precise regret bounds in episodic tabular Markov Decision Processes. These advancements address limitations in existing methods by offering fine-grained, gap-dependent regret guarantees. The study introduces a novel analytical framework and proposes new algorithms, ULCB-Hoeffding and a refined AMB, which demonstrate improved performance and theoretical rigor. AI

Q-learning
Markov decision processes
UCB-Hoeffding
ULCB-Hoeffding
Haochen Zhang