
arXiv:2606.30454v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly used as agents in simulations of social systems, yet it remains unclear when their behavior can be interpreted as a faithful proxy for human decision-making. Here we test LLM agents against a direct empirical benchmark: a large-scale networked Prisoner's Dilemma experiment with human participants. Using the same interaction protocol, payoff structure, and network topologies, we compare nine open-weight LLMs with the human data. The selected model reproduces several macro-level features of cooperatio
The proliferation of LLMs and their increasing application in agentic contexts necessitates a deeper understanding of their fidelity to real-world human behavior in complex social interactions.
This research provides a critical benchmark for the reliability of LLM agents as proxies for human decision-making, impacting their utility in economic, social, and policy simulations.
The ability to simulate complex human social dynamics with LLM agents, even without perfect individual fidelity, opens new avenues for understanding and predicting collective behaviors.
- · AI researchers
- · Social scientists
- · Developers of agentic LLM systems
- · Predictive models relying solely on individual LLM fidelity
LLM agents will be more widely adopted for simulations requiring collective behavior rather than precise individual mimicry.
This improved understanding of collective LLM agent behavior could lead to new applications in policy testing and emergent risk identification.
The development of LLM-driven 'digital populations' for advanced social and economic modeling could become a new frontier in computational social science.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI