Two new research papers explore the limitations of current large language models in simulating realistic human behavior. The first paper, "OmniBehavior," introduces a benchmark using real-world data and finds that LLMs tend to exhibit a positive, homogenized bias, failing to capture individual differences. The second paper, "DITTO," proposes a reinforcement learning approach that incorporates verbal feedback to improve LLM simulation capabilities, showing significant gains over base models and outperforming GPT-5.4 on several benchmarks. AI
影响 New benchmarks and RL techniques highlight LLM limitations in simulating diverse human behaviors, indicating a need for more nuanced training data and feedback mechanisms.
排序理由 Two academic papers published on arXiv introduce new benchmarks and methods for evaluating LLM simulation of human behavior.
AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →