SIGNALAI·May 27, 2026, 4:00 AMSignal75Medium term

Post-training makes large language models less human-like

arXiv:2605.07632v2 Announce Type: replace-cross Abstract: Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in newer model generations even as base mode

Why this matters

Why now

The proliferation of LLMs and their increasing deployment as human surrogates necessitate a deeper understanding of their behavioral alignment, leading to new research like Psych-201.

Why it’s important

This research indicates that the training processes designed to make LLMs useful also diminish their human-like behavior, posing challenges for applications requiring high fidelity human interaction or psychological modeling.

What changes

The assumption that more advanced or 'post-trained' LLMs are inherently more human-like is challenged, requiring a re-evaluation of model selection and training objectives for specific applications.

Winners

· Researchers focused on base model understanding
· Model developers specializing in pre-training
· Synthetic data generation techniques

Losers

· Applications requiring high human behavioral fidelity from post-trained LLMs
· Those assuming post-training universally improves LLM utility for human simulati
· Current post-training methodologies

Second-order effects

Direct

Increased focus on evaluating LLM human alignment metrics beyond traditional benchmarks.

Second

Development of new post-training techniques specifically aimed at preserving or enhancing human-like behavior.

Third

Potential for a divergence in LLM development, with some optimized for utility and others for human behavioral accuracy.

Editorial confidence: 90 / 100 · Structural impact: 60 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.LG

#cs.CL #cs.AI #cs.LG

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.