Model-Based Learning of Whittle indices
Researchers have developed BLINQ, a novel model-based algorithm designed to learn Whittle indices for Markov Decision Processes. This new approach constructs an empirical estimate of the MDP and then computes the indices, offering a proven convergence guarantee and a bound on learning time. Numerical experiments indicate BLINQ requires fewer samples than existing Q-learning methods for accurate approximations and has a lower overall computational cost. AI