
arXiv:2605.31349v1 Announce Type: cross Abstract: Hateful meme detection remains a formidable challenge for vision-language models, as existing benchmarks are structurally observational - confounding rhetorical hate mechanisms with target community features and preventing causal evaluation of model vulnerabilities. To address this, we introduce FBHM, a systematically curated benchmark of Functionality Based Hateful Memes constructed along two orthogonal axes: 25 distinct rhetorical functionalities and 10 target communities (5,000 memes total). Benchmarking state-of-the-art VLMs reveals a sever
The proliferation of multimodal content, combined with the increasing sophistication of VLMs, has highlighted limitations in current detection methods for nuanced harmful content like hateful memes, necessitating focused research and improved benchmarks.
Better detection of harmful online content is crucial for platform moderation, combating misinformation, and reducing social polarization, impacting public discourse and platform liability.
The introduction of FBHM offers a more robust and functionally-based benchmark for evaluating and steering VLMs in hateful meme detection, moving beyond observational confounding toward causal evaluation of model vulnerabilities.
- · Social Media Platforms
- · AI Safety Researchers
- · Content Moderation Services
- · AI Ethics & Governance Bodies
- · Creators of Hateful Memes
- · VLMs with limited 'rhetorical functionality' understanding
Improved VLM performance in identifying and mitigating online hate speech contained within multimodal content.
Reduced prevalence of hateful memes on major platforms, potentially leading to a less toxic online environment for users.
Enhanced public trust in AI moderation systems and a shift towards more nuanced, context-aware content policies across the internet.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.AI