Is GPT-4o mini Blinded by its Own Safety Filters? Exposing the Multimodal-to-Unimodal Bottleneck in Hate Speech Detection

arXiv:2509.13608v2 Announce Type: replace Abstract: As Large Multimodal Models (LMMs) become integral to daily digital life, understanding their safety architectures is a critical problem for AI Alignment. This paper presents a systematic analysis of OpenAI's GPT-4o mini, a globally deployed model, on the difficult task of multimodal hate speech detection. Using the Hateful Memes Challenge dataset, we conduct a multi-phase investigation on 500 samples to probe the model's reasoning and failure modes. Our central finding is the experimental identification of a "Unimodal Bottleneck," an architec
The proliferation of LMMs like GPT-4o mini in real-world applications creates an immediate need to understand and address their limitations, especially regarding safety and bias. This paper's release specifically addresses a current concern about AI safety and robustness.
This research reveals a critical vulnerability in widely deployed multimodal AI models, indicating that aggressive safety filters can inadvertently impair their ability to detect subtle but important issues like hate speech. Strategic readers should care because this impacts the trustworthiness and effectiveness of AI systems in critical societal applications.
Our understanding of LMM safety mechanisms changes, highlighting a 'unimodal bottleneck' that might lead to underperformance in complex multimodal tasks despite safety intentions. This suggests a need for more sophisticated calibration of safety features.
- · AI safety researchers
- · Companies developing advanced multimodal safety architectures
- · Organizations focused on ethical AI deployment
- · Developers relying solely on current LMM safety filters for complex content mode
- · Users employing LMMs for nuanced hate speech detection without independent verif
OpenAI and other LMM developers will likely need to re-evaluate and refine their safety filtering mechanisms to mitigate the unimodal bottleneck.
Increased scrutiny and demand for transparent testing of LMMs' real-world safety capabilities will emerge, potentially leading to new industry standards.
A potential shift towards more explicit, context-aware AI safety architectures rather than blanket filters could foster more robust and less 'blind' AI moderation.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.LG