Adapting Reinforcement Learning with Chain-of-Thought Supervision for Explainable Detection of Hateful and Propagandistic Memes

arXiv:2606.15307v1 Announce Type: new Abstract: Hateful and propagandistic memes exploit the interplay between images and text to convey harmful intent that neither modality reveals alone. Although thinking-based multimodal large language models (MLLMs) have advanced vision-language understanding, their application to meme content moderation remains underexplored. We propose a reinforcement learning-based post-training method that improves classification performance and reference-based explanation quality in thinking-based MLLMs via task-specific rewards and Group Relative Policy Optimization
The proliferation of harmful online content necessitates advanced moderation techniques, with AI-based approaches becoming increasingly sophisticated to address complex multimodal threats like hateful memes.
This development represents a significant step towards more explainable and effective AI moderation, critical for platforms facing regulatory pressures and societal demands to combat misinformation and hate speech.
The ability of AI models to not only detect but also explain their decisions regarding harmful multimodal content improves transparency and trustworthiness in automated moderation systems.
- · Social media platforms
- · Content moderation services
- · AI ethics researchers
- · Online users
- · Purveyors of hate speech and propaganda
- · Manual content moderators (long-term increased automation)
Improved detection and explanation capabilities for harmful memes will lead to cleaner online environments.
Increased trust in AI moderation could lead to broader adoption of AI for diverse content governance challenges.
More sophisticated AI content moderation could spark an 'arms race' between AI detection and AI-generated harmful content, requiring continuous innovation.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL