
arXiv:2605.22081v1 Announce Type: new Abstract: We present ArabDiscrim, a decade-long lexical resource and corpus of 293K public Arabic Facebook posts (2014--2024) discussing racism and discrimination. Unlike existing Twitter-centric datasets, ArabDiscrim integrates platform-native engagement signals, including reactions, shares, comments, and page metadata, enabling joint analysis of language and audience response. The resource includes 200 curated terms (100 racism-related and 100 discrimination-related) with morphological regex families (13+ inflections per lemma), and 20 discrimination axe
The release of a comprehensive, decade-long dataset on Arabic social media discourse provides new linguistic capabilities for understanding and combating online discrimination.
This resource advances AI's ability to analyze complex social sentiment in diverse languages, offering tools to understand digital cultural dynamics and potentially mitigate harmful online content.
The availability of ArabDiscrim shifts the capability for social media analysis, enabling more nuanced study of discrimination and racism across Arabic-speaking online communities.
- · AI researchers
- · Social media platforms
- · Content moderation teams
- · Digital rights organizations
- · Propagators of online discrimination
- · Creators of harmful content
Improved detection and moderation of hate speech in Arabic online spaces.
Development of more culturally sensitive and effective AI models for social discourse analysis.
Enhanced understanding of the socio-linguistic patterns of online discrimination, potentially informing public policy and educational initiatives.
This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.
Read at arXiv cs.CL