
arXiv:2606.01196v1 Announce Type: new Abstract: Safety alignment learned in high-resource languages transfers poorly to low-resource languages. Models refuse harmful prompts in English but fail to refuse when the same prompts are translated into Swahili or Burmese. Adaptive steering methods like AdaSteer and CAST inherit this failure cross-lingually. We diagnose where transfer breaks down. Across Qwen2.5-7B, Gemma-2-9B, and Llama-3.1-8B on 23 languages, the harmfulness direction extracted from high-resource activations linearly separates harmful from harmless low-resource prompts nearly as wel
The proliferation of advanced AI models across diverse linguistic contexts is exposing fundamental limitations in safety alignment, particularly for non-English languages.
This research highlights a critical vulnerability in global AI deployment, indicating that current safety mechanisms are not universally robust and could lead to unintended harms or strategic misuse in low-resource language environments.
The understanding that AI safety failures in low-resource languages stem from action failures rather than representation failures shifts the focus for developing more effective and inclusive safety alignment strategies.
- · AI safety researchers focused on multilingualism
- · Organizations developing culturally nuanced AI safety standards
- · Users of low-resource languages
- · AI models with English-centric safety alignment
- · Developers neglecting multilingual safety testing
- · Governments relying solely on high-resource language safety benchmarks
AI models will continue to exhibit safety vulnerabilities when deployed in diverse linguistic environments.
There will be increased pressure for AI developers to invest significantly in multilingual safety research and culturally aware alignment techniques.
The disparity in AI safety performance across languages could exacerbate digital divides and create new vectors for information manipulation or control.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL