Probing Outcome-Level Resemblance and Mechanism-Level Alignment in LLM Risk Decisions: Evidence from the St. Petersburg Game
A new research paper explores whether Large Language Models (LLMs) truly align with human decision-making mechanisms when faced with risk, using the St. Petersburg game as a testbed. While many LLMs produce human-like finite bids in the original game, this outcome-level resemblance often hides differing underlying reasoning processes. Controlled variants of the game reveal that LLMs frequently shift to conditionally rational behavior rather than maintaining human-consistent mechanisms, even after instruction tuning. AI
IMPACT Highlights the need for deeper evaluation of LLM decision-making beyond surface-level outcomes to ensure true alignment.