Predicting Effects, Missing Distributions: Evaluating LLMs as Human Behavior Simulators in Operations Management

arXiv:2510.03310v2 Announce Type: replace Abstract: Large language models (LLMs) are increasingly used to simulate human behavior in business, economics, and the social sciences, offering a low-cost complement to laboratory experiments, field studies, and surveys. This paper evaluates how well LLMs replicate human behavior in operations management. Using nine published behavioral-operations experiments, we assess LLM performance along two dimensions: whether LLM-generated data reproduce the original hypothesis-test outcomes, and whether their full response distributions align with human data,
The rapid advancement and accessibility of large language models have created an urgent need to rigorously evaluate their capabilities as simulators for complex human behaviors.
Understanding the fidelity of LLMs as human behavior simulators is crucial for their reliable deployment in business, economics, and social sciences, potentially transforming research methodologies and operational efficiencies.
This research provides a framework for assessing LLM efficacy in replicating human experimental outcomes, moving beyond simple hypothesis testing to scrutinize full response distributions, which refines how LLMs are applied in lieu of traditional studies.
- · AI/ML research labs
- · Operations management researchers
- · Businesses using LLMs for simulation
- · Traditional behavioral research consultancies
- · Organizations relying on untested LLM simulations
LLMs will be increasingly used in operational design and strategic planning as a low-cost alternative to human experiments.
The ethical implications of simulating human behavior will intensify, requiring new regulatory frameworks and oversight for LLM deployment in sensitive areas.
The development of LLMs will shift towards models specifically optimized for behavioral simulation, potentially leading to new architectures and training paradigms focused on mimicking human cognitive processes more accurately.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG