
arXiv:2606.20205v1 Announce Type: cross Abstract: Psychological instruments designed for humans are increasingly used to assign large language models (LLMs) stable psychological profiles that affect their usability, safety assessment, and use as proxies for human participants in research. Using a formal psychometric framework, we show that these profiles are largely a measurement artifact. Administering a battery of personality and risk-preference instruments spanning self-reports and behavioral tasks to 56 instruction-tuned LLMs alongside large human reference samples, we report four findings
This research is emerging as psychological assessments of LLMs become prevalent, fueled by the rapid development and deployment of these models across various applications.
This study challenges the validity of a common approach to evaluating LLM behavior, directly impacting safety assessment, usability, and the use of LLMs in research as human proxies.
The perceived 'psychological profiles' of LLMs are now framed as largely an artifact of measurement, necessitating a re-evaluation of current assessment methodologies and the insights derived from them.
- · LLM developers investing in robust, LLM-specific evaluation methods
- · Academic researchers focused on AI ethics and measurement theory
- · Organizations prioritizing scientific rigor in AI safety assessments
- · Platforms and researchers relying solely on human psychological instruments for
- · Projects that have made significant claims based on LLM 'personalities'
- · The narrative that LLMs possess stable psychological traits akin to humans
There will be increased demand for new, AI-native methodologies to assess LLM safety, bias, and operational characteristics.
Public perception and regulatory approaches to LLMs may shift away from anthropomorphic psychological interpretations, leading to more technical and functional evaluations.
This could accelerate the professionalization of 'LLM psychometrics' as a distinct field, attracting new research and investment into specific AI evaluation frameworks.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL