Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

arXiv:2604.08362v2 Announce Type: replace-cross Abstract: The emergence of Large Language Models (LLMs) has illuminated the potential for a general-purpose user simulator. However, existing benchmarks remain constrained to isolated scenarios, narrow action spaces, or synthetic data, failing to capture the holistic nature of authentic human behavior. To bridge this gap, we introduce OmniBehavior, the first user simulation benchmark constructed entirely from real-world data, integrating long-horizon, cross-scenario, and heterogeneous behavioral patterns into a unified framework. Based on this be
The rapid advancements in LLMs have created the technical substrate making realistic human behavior simulation a tangible research goal, as evidenced by the introduction of new benchmarks like OmniBehavior.
Sophisticated human behavior simulation is critical for building more robust, general-purpose AI agents and for accelerating development in fields ranging from robotics to social science and economics.
The ability of LLMs to simulate complex, long-horizon, and cross-scenario real-world human behavior is significantly enhanced, moving beyond prior limitations of isolated or synthetic environments.
- · AI agents developers
- · Simulation platform companies
- · Robotics companies
- · Social science researchers
- · Companies relying on narrow AI simulations
- · Traditional human subject research (for some applications)
LLMs can be effectively trained and tested in more realistic simulated environments, accelerating AI development cycles.
Advanced AI agents, having 'learned' in robust simulations, can perform complex tasks autonomously in the real world more reliably.
The development of highly realistic human simulators could lead to transformative applications in drug discovery, policy testing, and even societal-scale experiments, raising new ethical considerations.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG