A new study introduces Psych-201, a dataset designed to measure how well large language models mimic human behavior. The research found that post-training, the process used to make LLMs more helpful, consistently makes them less aligned with human actions. This misalignment increases with newer model generations, even as their base capabilities improve. Additionally, techniques like persona-induction, which aim to make models more human-like by using participant-specific data, do not enhance individual predictions. AI
Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →
IMPACT Suggests current LLM fine-tuning processes may hinder their use as accurate models of human behavior.
RANK_REASON The cluster contains an academic paper detailing a new dataset and findings about LLM behavior. [lever_c_demoted from research: ic=1 ai=1.0]