A new research paper compares the exploration-exploitation strategies of large language models (LLMs) and humans using standard multi-armed bandit experiments. The study found that enabling thinking traces in LLMs shifted their behavior towards more human-like exploration in stationary environments. However, LLMs struggled to match human adaptability in complex, non-stationary settings, particularly in directed exploration, despite achieving similar regret in some cases. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Highlights limitations of LLMs as human behavior simulators and suggests areas for improvement in complex decision-making.
RANK_REASON Academic paper comparing LLM and human decision-making strategies.