SIGNALAI·Jun 1, 2026, 4:00 AMSignal75Medium term

Human Psychometric Questionnaires Mischaracterize LLM Behavior

arXiv:2509.10078v4 Announce Type: replace-cross Abstract: We examine whether human psychometric questionnaires can serve as reliable tools for characterizing and predicting LLM behavior in everyday user interactions. We analyze eight open-source LLMs by comparing their value and personality profiles derived from two different methods: Likert self-reports on established questionnaires (PVQ-40/21 and BFI-44/10) and generation probabilities over value-laden responses to everyday user queries. The two profiles diverge substantially. Within-construct item consistency, often cited as evidence of sta

Why this matters

Why now

The proliferation of LLMs and increasing attempts to integrate them into sensitive applications necessitates robust methods for understanding their internal states and behavioral predictions.

Why it’s important

This research highlights a fundamental challenge in assessing and aligning LLM behavior with human values, impacting development, regulation, and trust.

What changes

Reliance on traditional human psychometric questionnaires for evaluating LLMs is now questionable, requiring new methods for characterizing LLM 'personalities' and 'values'.

Winners

· AI safety researchers
· Developers of new LLM evaluation methodologies
· Transparency and interpretability startups

Losers

· Companies relying on simple questionnaire-based LLM assessments
· Researchers using outdated psychometric tools for AI evaluation

Second-order effects

Direct

LLM developers will need to find alternative or more sophisticated methods to characterize the 'values' and 'personalities' of their models.

Second

Public and regulatory trust in LLMs, especially in sensitive applications, could be undermined if models are found to deviate from expected values despite questionnaire results.

Third

A new industry or sub-field could emerge focused on developing novel, AI-specific psychometric and behavioral assessment tools for large language models.

Editorial confidence: 90 / 100 · Structural impact: 55 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.AI

#cs.CL #cs.AI

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.