
arXiv:2602.05302v3 Announce Type: replace Abstract: We present an in-depth evaluation of LLMs' ability to negotiate, a central business task requiring strategic reasoning, theory of mind, and economic value creation. To do so, we introduce PieArena, a large-scale negotiation benchmark grounded in multi-agent interactions over realistic scenarios adapted from MBA negotiation courses at an elite business school. We evaluate language agents across three pairing regimes: mirror-play, cross-play, and human-LM play. We develop a ranking model for continuous negotiation payoffs that yields order-inva
The rapid advancement of large language models makes evaluating complex multi-agent interactions like negotiation a critical next step for real-world deployment.
This benchmark provides a standardized, realistic method to measure and improve AI's ability to perform high-value business negotiations, impacting white-collar productivity and strategic operations.
The ability to rigorously rank and profile language agents in negotiation scenarios will accelerate the development of more capable and trustworthy AI agents for complex business tasks.
- · AI Agent developers
- · Businesses adopting AI agents
- · Elite business schools and their curricula
- · White-collar workers in repetitive negotiation roles
- · Current simplistic AI evaluation frameworks
More sophisticated and reliable AI agents will emerge for complex business interactions.
Human negotiators will increasingly be augmented or replaced by AI in routine to moderately complex scenarios.
The definition of strategic reasoning and human competitive advantage in business will shift towards areas less susceptible to AI automation.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI