
arXiv:2606.25380v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly deployed across languages, but their safety behavior remains uneven across linguistic and cultural contexts. This survey synthesizes work on toxicity detection and detoxification for multilingual LLMs. We first catalogue threat models that exploit language choice, translation pivots, code-switching, orthographic variation, multi-turn interaction, and post-deployment fine-tuning to weaken safety alignment. We then organize task formulations (toxic-to-neutral rewriting, toxicity classification, and toxi
As LLMs are deployed globally, the need to address safety and toxicity across diverse linguistic and cultural contexts has become immediate and critical.
Ensuring LLM safety in multilingual environments is crucial for trust, ethical deployment, and preventing the spread of harmful content across international borders.
This survey provides a structured synthesis of current challenges and mitigation strategies, allowing for more targeted development and policy around multilingual LLM safety.
- · AI Safety Researchers
- · Multinational Tech Companies
- · International Organizations
- · Malicious Actors
- · Unregulated AI Deployments
Improved understanding and tools for detecting and mitigating toxicity in multilingual LLMs.
Reduced incidence of harmful AI-generated content in non-English speaking regions, fostering greater global adoption.
Potentially leads to the development of internationally standardized safety protocols for AI, impacting regulatory landscapes.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL