Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception

arXiv:2604.28048v2 Announce Type: replace Abstract: Large Language Models (LLMs) are increasingly used as proxies for human perception in urban analysis, yet it remains unclear whether persona prompting produces meaningful and reproducible behavioral diversity. We investigate whether distinct personas influence urban sentiment judgments generated by multimodal LLMs. Using a factorial set of personas spanning gender, economic status, political orientation, and personality, we instantiate multiple agents per persona to evaluate urban scene images from the PerceptSent dataset and assess both with
The proliferation of LLMs and their increasing application in diverse fields, particularly those requiring human-like judgment, necessitates a deeper understanding of their reliability and biases when persona prompting is used.
This research directly addresses the validity and reproducibility of LLM outputs for nuanced tasks, which is critical for their safe and effective deployment as proxies for human perception in sensitive areas like urban analysis.
The findings suggest that simply assigning personas to LLMs may not consistently achieve the desired behavioral diversity, prompting a re-evaluation of current LLM agent design and prompting strategies for social perception tasks.
- · AI researchers focusing on explainable AI
- · Developers of robust LLM evaluation frameworks
- · Ethical AI advocates
- · Organizations relying on simple persona prompting for LLM agents
- · Users expecting nuanced, diverse opinions from persona-prompted LLMs without val
- · LLM applications in social science without rigorous testing
It highlights potential limitations in current LLM persona-based prompting for generating diverse and reliable 'human-like' insights.
This could lead to a demand for more sophisticated and validated methods for instilling specific perspectives into LLMs, moving beyond superficial persona assignments.
Long-term, it may drive the development of 'personality' architectures within LLMs or specialized fine-tuning approaches grounded in psychological models, rather than prompt engineering alone.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL