Brief · PulseAugur

TOOL · arXiv cs.AI English(EN) · 19h

PieArena: Ranking and Profiling Language Agents in Realistic Negotiation Scenarios

Researchers have introduced PieArena, a new benchmark designed to evaluate the negotiation capabilities of large language models. This benchmark utilizes realistic scenarios adapted from MBA negotiation courses and assesses models across various pairing regimes, including human-AI interactions. The evaluation goes beyond simple outcome scores to provide a multi-dimensional behavioral profile, examining aspects like instruction compliance, deception, and reputation. Notably, a frontier model, GPT-5, demonstrated performance comparable to or exceeding human baselines in these negotiation tasks. AI

IMPACT Establishes a new standard for evaluating LLM strategic reasoning and negotiation, potentially driving improvements in agentic capabilities for business applications.

GPT-5
Yu Zhu
PieArena