New Pareto Q-Learning algorithm enhances multi-objective RL with reward machines · arXiv cs.AI

By PulseAugur Editorial · [1 sources] · 2026-06-17 14:44

Researchers have introduced Pareto Q-Learning with Reward Machines (PQLRM), a novel multi-objective reinforcement learning algorithm designed for tasks with complex reward structures defined by reward machines. PQLRM integrates Pareto Q-Learning, which handles vector-valued Q-estimates to approximate the Pareto front, with Q-Learning with Reward Machines, leveraging the automaton structure of reward signals. This approach results in a multi-policy algorithm that is sample-efficient even with non-Markovian, RM-encoded rewards, and experimental results indicate it converges faster than a standard Pareto Q-Learning baseline and can generate Pareto-optimal policies that QRM alone cannot. AI

IMPACT Introduces a more sample-efficient approach for multi-objective reinforcement learning tasks with structured rewards.

RANK_REASON Academic paper detailing a new algorithm. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · Léo Saulières · 2026-06-17 14:44

Pareto Q-Learning with Reward Machines

We present Pareto Q-Learning with Reward Machines (PQLRM), a multi-objective reinforcement learning algorithm for tasks whose reward structure is specified by a set of reward machines (RMs). PQLRM combines Pareto Q-Learning (PQL), which maintains sets of vector-valued Q-estimates…

COVERAGE [1]

Pareto Q-Learning with Reward Machines

RELATED ENTITIES

RELATED TOPICS