SIGNALAI·May 22, 2026, 4:00 AMSignal55Medium term

ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

arXiv:2605.22081v1 Announce Type: new Abstract: We present ArabDiscrim, a decade-long lexical resource and corpus of 293K public Arabic Facebook posts (2014--2024) discussing racism and discrimination. Unlike existing Twitter-centric datasets, ArabDiscrim integrates platform-native engagement signals, including reactions, shares, comments, and page metadata, enabling joint analysis of language and audience response. The resource includes 200 curated terms (100 racism-related and 100 discrimination-related) with morphological regex families (13+ inflections per lemma), and 20 discrimination axe

Why this matters

Why now

The release of a comprehensive, decade-long dataset on Arabic social media discourse provides new linguistic capabilities for understanding and combating online discrimination.

Why it’s important

This resource advances AI's ability to analyze complex social sentiment in diverse languages, offering tools to understand digital cultural dynamics and potentially mitigate harmful online content.

What changes

The availability of ArabDiscrim shifts the capability for social media analysis, enabling more nuanced study of discrimination and racism across Arabic-speaking online communities.

Winners

· AI researchers
· Social media platforms
· Content moderation teams
· Digital rights organizations

Losers

· Propagators of online discrimination
· Creators of harmful content

Second-order effects

Direct

Improved detection and moderation of hate speech in Arabic online spaces.

Second

Development of more culturally sensitive and effective AI models for social discourse analysis.

Third

Enhanced understanding of the socio-linguistic patterns of online discrimination, potentially informing public policy and educational initiatives.

Editorial confidence: 85 / 100 · Structural impact: 40 / 100

Original report

This signal links to a primary source. Continuum Brief monitors and indexes it as part of the live intelligence stream — we do not republish source content.

Read at arXiv cs.CL

#cs.CL

Tracked by The Continuum Brief · live intelligence network

The Brief · Weekly Dispatch

Stay ahead of the systems reshaping markets.