
arXiv:2605.26947v1 Announce Type: new Abstract: Kazakh is underrepresented in resources for evaluating the safety behavior of large language models. We present KZ-SafetyPrompts, a Kazakh prompt dataset for safety evaluation across eleven categories covering common risk areas such as self-harm, violence, child exploitation, sexual content, racist content, radicalization, and regulated goods or illegal activities. The dataset contains 5,717 prompts written natively in Kazakh (Cyrillic), organized by category, with English translations for cross-lingual analysis. Prompts resemble realistic user q
The proliferation of Large Language Models (LLMs) and growing concerns over their safety and cultural bias are driving the development of diverse evaluation datasets worldwide.
This initiative addresses the critical need for culturally and linguistically specific safety evaluations, highlighting the global effort to ensure AI systems are safe and unbiased for diverse populations, rather than relying solely on dominant language datasets.
The availability of KZ-SafetyPrompts provides a foundational resource for evaluating LLM safety in Kazakh, enabling better model alignment and reducing risks of harmful outputs for Kazakh-speaking users.
- · Kazakh-speaking AI users
- · Developers building LLMs for Central Asian markets
- · AI safety researchers
- · Kazakh cultural preservation groups
- · LLM developers ignoring linguistic diversity
- · Censorship regimes (potentially, as safety standards increase)
Improved safety and cultural relevance of LLMs deployed in Kazakh-speaking regions.
Increased demand for similar safety evaluation datasets in other underrepresented languages, fostering a more inclusive global AI ecosystem.
Potential for national-level AI safety regulations and standards to emerge, reflecting specific cultural and ethical considerations beyond Western norms.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL