An LLM-Native Psychometric Instrument Does Not Predict LLM Behavior: Evidence Across 25 Models

arXiv:2606.09843v1 Announce Type: cross Abstract: Large language models (LLMs) produce stable self-reports on personality inventories, but these self-reports do not predict observed behavior. Whether this gap reflects a mismatch between LLMs and human trait constructs, or a deeper property of LLM self-report itself, has been unresolved. We constructed the first psychometric instrument whose constructs are derived bottom-up from LLM behavioral affordances via exploratory factor analysis (EFA). We administered 300 items (240 direct Likert + 60 scenario-based) spanning 12 candidate behavioral dim
The proliferation of increasingly capable LLMs necessitates a deeper understanding of their internal 'psychology' and predictable behaviors, making this research timely as models become integrated into critical applications.
This research reveals a fundamental limitation in current methods of understanding and predicting LLM behavior, suggesting that anthropomorphic self-reports are misleading and that new, LLM-native measurement tools are required.
The assumption that LLMs can accurately 'self-report' on their behavior or internal states is undercut, forcing a re-evaluation of how we assess and align advanced AI systems and their potential autonomy.
- · AI safety researchers
- · Transparency tools developers
- · New psychometric frameworks
- · LLM anthropomorphizers
- · Developers relying on self-reporting for alignment
Researchers must develop new, LLM-specific methodologies to understand and predict their complex behaviors, moving away from human-centric psychological constructs.
This foundational insight will accelerate the development of more robust and auditable AI systems that do not rely on potentially deceptive self-reports.
Improved understanding of LLM behaviors could lead to more predictable and safer autonomous AI agents, but also a more nuanced public perception of their 'intelligence' and 'consciousness'.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL