PulseAugur
EN
LIVE 07:45:46

New Arabic Facebook corpus details racism and discrimination

Researchers have introduced ArabDiscrim, a new corpus of 293,000 Arabic Facebook posts spanning a decade (2014-2024) that focus on racism and discrimination. This dataset uniquely incorporates engagement metrics like reactions and shares, alongside page metadata, to analyze language and audience interaction. It also features 200 curated terms related to racism and discrimination, 20 distinct discrimination axes, and explicit attribution patterns, aiming to advance fairness-oriented Arabic Natural Language Processing. AI

IMPACT Provides a foundational resource for developing fairer and more context-aware Arabic NLP models, particularly for analyzing social issues.

RANK_REASON The cluster describes a new academic paper detailing a dataset release.

Read on arXiv cs.CL →

AI-generated summary · Google Gemini · from 2 sources. How we write summaries →

COVERAGE [2]

  1. arXiv cs.CL TIER_1 English(EN) · Wajdi Zaghouani, Shimaa Amer Ibrahim, Mabrouka Bessghaier, Houda Bouamor ·

    ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

    arXiv:2605.22081v1 Announce Type: new Abstract: We present ArabDiscrim, a decade-long lexical resource and corpus of 293K public Arabic Facebook posts (2014--2024) discussing racism and discrimination. Unlike existing Twitter-centric datasets, ArabDiscrim integrates platform-nati…

  2. arXiv cs.CL TIER_1 English(EN) · Houda Bouamor ·

    ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination

    We present ArabDiscrim, a decade-long lexical resource and corpus of 293K public Arabic Facebook posts (2014--2024) discussing racism and discrimination. Unlike existing Twitter-centric datasets, ArabDiscrim integrates platform-native engagement signals, including reactions, shar…