Brief · PulseAugur

RESEARCH · arXiv cs.AI English(EN) · 18h · [2 sources]

Pareto Q-Learning with Reward Machines

Researchers have introduced Pareto Q-Learning with Reward Machines (PQLRM), a novel multi-objective reinforcement learning algorithm designed for tasks with complex reward structures defined by reward machines. This algorithm integrates Pareto Q-Learning, which handles vector-valued Q-estimates for Pareto front approximation, with enhancements from Q-Learning with Reward Machines that leverage the automaton structure of reward signals. PQLRM aims to achieve sample efficiency in non-Markovian, reward machine-encoded environments and has demonstrated faster convergence and the ability to synthesize Pareto-optimal policies that other methods cannot. AI

IMPACT Enhances sample efficiency and policy synthesis in multi-objective reinforcement learning tasks with complex reward structures.

Hugging Face
arXiv
DagsHub
Markov decision process
alphaXiv
ScienceCast
CatalyzeX
Gotit.pub
Influence Flower
IArxiv
Pareto Q-Learning with Reward Machines
Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Pareto Q-Learning
Q-Learning with Reward Machines