PulseAugur
EN
LIVE 13:46:27

LLMs struggle to simulate real human behavior, new research shows

Two new research papers explore the limitations of current large language models in simulating realistic human behavior. The first paper, "OmniBehavior," introduces a benchmark using real-world data and finds that LLMs tend to exhibit a positive, homogenized bias, failing to capture individual differences. The second paper, "DITTO," proposes a reinforcement learning approach that incorporates verbal feedback to improve LLM simulation capabilities, showing significant gains over base models and outperforming GPT-5.4 on several benchmarks. AI

IMPACT New benchmarks and RL techniques highlight LLM limitations in simulating diverse human behaviors, indicating a need for more nuanced training data and feedback mechanisms.

RANK_REASON Two academic papers published on arXiv introduce new benchmarks and methods for evaluating LLM simulation of human behavior.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

LLMs struggle to simulate real human behavior, new research shows

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Jiawei Chen, Ruoxi Xu, Boxi Cao, Ruotong Pan, Yunfei Zhang, Yifei Hu, Yong Du, Tingting Gao, Yaojie Lu, Yingfei Sun, Xianpei Han, Le Sun, Xiangyu Wu, Hongyu Lin ·

    Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

    arXiv:2604.08362v2 Announce Type: replace Abstract: The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulator. However, existing benchmarks remain constrained to isolated scenarios, narrow action spaces, or synthetic data, fa…

  2. arXiv cs.CL TIER_1 English(EN) · Maarten Sap ·

    Reinforcing Human Behavior Simulation via Verbal Feedback

    Humans learn social norms and behaviors from verbal feedback (e.g., a parent saying "that was rude" or a friend explaining "here's why that hurt"). Yet, learning from feedback for LLMs has largely focused on domains like code and math, where RL rewards are directly verifiable and…