Researchers have introduced Pareto Q-Learning with Reward Machines (PQLRM), a novel multi-objective reinforcement learning algorithm designed for tasks with complex reward structures defined by reward machines. PQLRM integrates Pareto Q-Learning, which handles vector-valued Q-estimates to approximate the Pareto front, with Q-Learning with Reward Machines, leveraging the automaton structure of reward signals. This approach results in a multi-policy algorithm that is sample-efficient even with non-Markovian, RM-encoded rewards, and experimental results indicate it converges faster than a standard Pareto Q-Learning baseline and can generate Pareto-optimal policies that QRM alone cannot. AI
IMPACT Introduces a more sample-efficient approach for multi-objective reinforcement learning tasks with structured rewards.
RANK_REASON Academic paper detailing a new algorithm. [lever_c_demoted from research: ic=1 ai=1.0]
- alphaXiv
- arXiv
- CatalyzeX
- DagsHub
- Gotit.pub
- Hugging Face
- IArxiv
- Influence Flower
- Markov decision process
- Pareto Q-Learning
- Pareto Q-Learning with Reward Machines
- Q-Learning with Reward Machines
- Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
- ScienceCast
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →