PulseAugur
实时 15:41:02
English(EN) Reinforcing Human Behavior Simulation via Verbal Feedback

研究表明大型语言模型在模拟真实人类行为方面存在困难

两篇新研究论文探讨了当前大型语言模型在模拟真实人类行为方面的局限性。第一篇论文“OmniBehavior”引入了一个使用真实世界数据的基准测试,发现大型语言模型倾向于表现出积极的、同质化的偏见,未能捕捉个体差异。第二篇论文“DITTO”提出了一种结合语言反馈的强化学习方法来提高大型语言模型的模拟能力,与基础模型相比有了显著的提升,并在多项基准测试中超越了GPT-5.4。 AI

影响 新的基准测试和强化学习技术突显了大型语言模型在模拟多样化人类行为方面的局限性,表明需要更细致的训练数据和反馈机制。

排序理由 两篇在arXiv上发表的学术论文引入了新的基准测试和评估大型语言模型人类行为模拟的方法。

在 arXiv cs.CL 阅读 →

AI 生成摘要 · Google Gemini · 来自 2 个来源。 我们如何撰写摘要 →

研究表明大型语言模型在模拟真实人类行为方面存在困难

报道来源 [2]

  1. arXiv cs.CL TIER_1 English(EN) · Jiawei Chen, Ruoxi Xu, Boxi Cao, Ruotong Pan, Yunfei Zhang, Yifei Hu, Yong Du, Tingting Gao, Yaojie Lu, Yingfei Sun, Xianpei Han, Le Sun, Xiangyu Wu, Hongyu Lin ·

    Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

    arXiv:2604.08362v2 Announce Type: replace Abstract: The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulator. However, existing benchmarks remain constrained to isolated scenarios, narrow action spaces, or synthetic data, fa…

  2. arXiv cs.CL TIER_1 English(EN) · Maarten Sap ·

    Reinforcing Human Behavior Simulation via Verbal Feedback

    Humans learn social norms and behaviors from verbal feedback (e.g., a parent saying "that was rude" or a friend explaining "here's why that hurt"). Yet, learning from feedback for LLMs has largely focused on domains like code and math, where RL rewards are directly verifiable and…