Same Model, Different Weakness: How Language and Modality Reshape the Jailbreak Attack Surface in Frontier MLLMs

arXiv:2605.23157v1 Announce Type: new Abstract: The attack surface of a multimodal large language model (MLLM) is language-dependent in ways that reveal the mechanistic structure of alignment failures. We present the first systematic cross-lingual, multimodal red-teaming study comparing jailbreak vulnerability in US English (en-US) and Mexican Spanish (es-MX) across four frontier MLLMs: Claude Sonnet 4.5, GPT-5, Pixtral Large, and Qwen Omni. Using a fixed adversarial benchmark of 363 diverse prompt scenarios administered in text-only and multimodal conditions, we collected 52,272 harm ratings
The rapid deployment and increasing sophistication of MLLMs necessitate a deeper understanding of their vulnerabilities across diverse linguistic and multimodal contexts, especially as they become more integrated into global applications.
This study highlights that AI safety and alignment issues are not universally transferable, revealing distinct language and modality-dependent weaknesses that can be exploited, impacting the global deployment and trustworthiness of frontier MLLMs.
The understanding of MLLM security shifts from a monolithic view to one that acknowledges linguistic and multimodal nuances in attack surfaces, requiring more localized and culturally aware red-teaming and safety measures.
- · AI safety researchers
- · Multilingual AI developers
- · Governments focused on AI regulation
- · Monolingual AI development methodologies
- · Companies with solely English-centric safety protocols
- · Users in non-English speaking markets vulnerable to exploits
AI developers will need to conduct more extensive and localized red-teaming for their MLLMs, factoring in both language and modality.
This could lead to a fragmentation of AI safety standards and practices, with different regions or languages requiring tailored alignment strategies and resources.
Increased awareness of language-dependent vulnerabilities may spark national efforts to develop and secure AI models in local languages, reducing reliance on models primarily developed and tested in English.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL