PulseAugur
EN
LIVE 05:51:22

New Pareto Q-Learning algorithm enhances multi-objective RL with reward machines · arXiv cs.AI

Researchers have introduced Pareto Q-Learning with Reward Machines (PQLRM), a novel multi-objective reinforcement learning algorithm designed for tasks with complex reward structures defined by reward machines. PQLRM integrates Pareto Q-Learning, which handles vector-valued Q-estimates to approximate the Pareto front, with Q-Learning with Reward Machines, leveraging the automaton structure of reward signals. This approach results in a multi-policy algorithm that is sample-efficient even with non-Markovian, RM-encoded rewards, and experimental results indicate it converges faster than a standard Pareto Q-Learning baseline and can generate Pareto-optimal policies that QRM alone cannot. AI

IMPACT Introduces a more sample-efficient approach for multi-objective reinforcement learning tasks with structured rewards.

RANK_REASON Academic paper detailing a new algorithm. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

  1. arXiv cs.AI TIER_1 English(EN) · Léo Saulières ·

    Pareto Q-Learning with Reward Machines

    We present Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm for tasks whose reward structure is specified by a set of reward machines (RMs). PQLRM combines Pareto Q-Learning (PQL), which maintains sets of vector-valued Q-estimates…