A new research paper explores the mismatch between reasoning capabilities and behavioral simulation in large language models used for multi-agent negotiation. The study found that models like DeepSeek and OpenAI's GPT-5.2, when used for their reasoning abilities, often defaulted to authority-driven outcomes rather than negotiated ones. The paper suggests that evaluating models based on their intended behavioral role, rather than just strategic capability, is crucial for accurate institutional simulations. AI
影响 Highlights the need to evaluate LLMs for specific behavioral roles in simulations, not just raw strategic capability.
排序理由 The cluster contains an arXiv paper detailing research findings on LLM behavior. [lever_c_demoted from research: ic=1 ai=1.0]
AI 生成摘要 · Google Gemini · 来自 1 个来源。 我们如何撰写摘要 →