Brief · PulseAugur

TOOL · arXiv cs.LG English(EN) · 4d

MoralityGym: A Benchmark for Evaluating Hierarchical Moral Alignment in Sequential Decision-Making Agents

Researchers have developed MoralityGym, a new benchmark designed to evaluate how well AI agents can navigate complex ethical dilemmas and adhere to hierarchical moral norms. The benchmark utilizes a novel formalism called Morality Chains to represent ethical constraints and presents 98 trolley-dilemma-style problems within Gymnasium environments. Initial tests using Safe RL methods highlighted current limitations in AI's ethical reasoning, suggesting a need for more advanced approaches to ensure AI systems behave ethically and transparently in real-world scenarios. AI

IMPACT Provides a new framework for developing and testing AI systems capable of ethical reasoning in complex, real-world situations.

Gymnasium
Morality Chains
Safe RL
MoralityGym