Online Learning in MDPs with Partially Adversarial Transitions and Losses
Researchers have developed new algorithms for reinforcement learning in environments with partially adversarial transitions. These algorithms utilize "conditioned occupancy measures" to maintain stability across episodes, even when facing adversarial behavior at specific points. The proposed methods achieve improved regret bounds compared to existing approaches, with one algorithm offering a reduction in regret that removes the need to identify the adversarial steps. AI
IMPACT Introduces novel algorithms for reinforcement learning in complex environments, potentially improving agent performance in scenarios with unpredictable elements.