
arXiv:2605.30804v1 Announce Type: new Abstract: We audit six large language models (LLMs) for gender stereotyping across English, Korean, Chinese, and Japanese. Three were developed primarily for English-language use (Claude, GPT, Gemini) and three for East Asian use (DeepSeek, Syn-Pro, HyperCLOVA X). We adopt the HEXACO-100 personality inventory and anchor each model against a cross-cultural human dataset spanning 48 countries to ask not whether LLMs are biased, but how far their gender attributions drift from the populations they are deployed among. Our findings show that their stereotyping
The proliferation of Large Language Models (LLMs) and their increasing deployment across diverse linguistic and cultural contexts necessitates immediate and rigorous auditing of their inherent biases.
Understanding the cross-cultural gender biases in LLMs relative to human baselines is crucial for responsible AI development, mitigating social harms, and ensuring ethical deployment in global markets.
This research provides a quantifiable method to measure and compare LLM bias against human populations, shifting the AI ethics conversation from 'if biased' to 'how biased' and 'how different from humans'.
- · AI ethics researchers
- · Developers of inclusive AI models
- · Governments setting AI regulation
- · LLMs with unmitigated biases
- · Companies deploying unaudited LLMs
- · Users experiencing biased AI interactions
Increased focus on culturally specific bias detection and mitigation techniques in LLM development.
Development of regulatory frameworks and industry standards requiring cross-cultural bias audits for AI models.
Market differentiation emerging for 'culturally intelligent' or 'bias-aware' AI platforms, impacting adoption in diverse regions.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL