Brief

last 24h

[8/8] 221 sources

Multi-source AI news clustered, deduplicated, and scored 0–100 across authority, cluster strength, headline signal, and time decay.

TOOL · arXiv cs.CL English(EN) · 4d

Benchmarking Commercial ASR Systems on Code-Switching Speech: Arabic, Persian, and German

A new benchmark study evaluated five commercial automatic speech recognition (ASR) systems on code-switching speech, specifically focusing on Arabic, Persian, and German mixed with English. The research introduced a novel pipeline using GPT-4o and Gemini 1.5 Pro to score transcripts, reducing LLM costs by 91% and employing BERTScore as a more reliable metric than traditional Word Error Rate (WER) for certain language pairs. ElevenLabs Scribe v2 emerged as the top performer, achieving the lowest WER and highest BERTScore across all tested language pairs. AI

IMPACT This research highlights the challenges in ASR for code-switching and introduces a more robust evaluation method, potentially guiding future development of multilingual speech technologies.
- GPT-4o
- Gemini 1.5 Pro
- Arabic
- English
- German
- Persian
- ElevenLabs Scribe v2
TOOL · arXiv cs.CL English(EN) · 5d

ArPoMeme: An Annotated Arabic Multimodal Dataset for Political Ideology and Polarization

Researchers have introduced ArPoMeme, a new dataset containing approximately 7,300 Arabic political memes. This dataset is annotated with ideological orientations such as Leftist, Islamist, Pan-Arabist, and Satirical, as well as dimensions of polarization like Us vs. Them framing and hostility. The creation of ArPoMeme involved a semi-automated pipeline using web scraping and the Qwen2.5-VL-7B vision-language model for text extraction, followed by manual annotation via a custom interface. Analysis of the dataset indicates that Islamist and satirical memes exhibit the highest levels of hostility and mobilization cues. AI

IMPACT Provides a new resource for analyzing multimodal political discourse and detecting polarization in Arabic content.
RESEARCH · arXiv cs.CL English(EN) · 4d · [2 sources]

Pattern-and-root inflectional morphology: the Arabic broken plural

Researchers have developed a new model for Arabic inflectional morphology, specifically focusing on broken plurals. This model reverses the traditional root-and-pattern approach to a pattern-and-root system, prioritizing patterns over roots. It separates inflection from derivation and semantics, and analyzes Arabic text directly from a word dictionary without needing morphophonological rules. The system classifies nouns with triliteral broken plurals into 22 patterns and 90 classes, and quadriliteral broken plurals into 3 patterns and 70 classes, resulting in 300 inflectional classes when singular variations are considered. AI

IMPACT This research could improve natural language processing for Arabic by providing a more structured and efficient way to handle inflectional morphology.
RESEARCH · arXiv cs.CL English(EN) · 4d · [2 sources]

Cohesion-6K: An Arabic Dataset for Analyzing Social Cohesion and Conflict in Online Discourse

Researchers have introduced Cohesion-6K, a new dataset designed to analyze social cohesion and conflict within Arabic online discourse. The dataset comprises six thousand Facebook posts related to the Israeli Occupation of Palestine, categorized on a spectrum from conflict to cohesion. Analysis of the data indicates that posts promoting conflict receive significantly more user engagement than those focused on resolution, highlighting a trend of divisive content gaining greater visibility. AI

IMPACT Provides a new resource for studying online polarization and the impact of AI-assisted annotation in computational social science.
RESEARCH · arXiv cs.CL English(EN) · 5d · [2 sources]

JobArabi: An Arabic Corpus and Analysis of Job Announcements from Social Media

Researchers have developed JobArabi, a new corpus of over 20,000 Arabic job announcements sourced from social media platforms like X. This dataset, collected between January 2024 and October 2025, uses a specialized query framework to capture diverse recruitment language. Analysis of the corpus reveals sociolinguistic patterns such as persistent gendered language, regional job demand variations, and the emotional tone of recruitment messages. AI

IMPACT Provides a new resource for Arabic NLP and computational social science research into labor market communication.
- JobArabi
- X
- Arabic
RESEARCH · arXiv cs.CL English(EN) · 4d · [2 sources]

Audience Engagement with Arabic Women's Social Empowerment and Wellbeing: A Decadal Corpus

Researchers have developed a decade-long corpus of Arabic Facebook posts focusing on women's social empowerment and wellbeing. The dataset comprises over 250,000 posts from more than 50,000 pages across 77 countries, spanning from 2013 to 2024. It includes extensive user interaction data, such as shares, comments, and emotional reactions, to facilitate large-scale analysis of gender discourse and social reform in Arabic dialects. AI

IMPACT Enables large-scale analysis of gender discourse and social reform in Arabic dialects, supporting research in NLP and computational social science.
RESEARCH · arXiv cs.CL English(EN) · 3d · [2 sources]

AraHopeCorpus: Annotation Guidelines and Dataset for Hope Speech in Arabic Social Media Crisis Discourse

Researchers have introduced AraHopeCorpus, a new dataset designed to study hope speech in Arabic social media during crises. The corpus, derived from 10,000 YouTube comments related to the Gaza conflict from 2023-2024, found that over 64% of comments expressed hope, primarily through religious encouragement, solidarity, and optimism. The dataset also identified about 13% as "no hope speech" reflecting despair, with the remainder being neutral or mixed. While large language models like ChatGPT can assist in annotation, they struggle with dialectal and culturally nuanced expressions. AI

IMPACT Provides a new resource for studying constructive digital discourse and hope speech detection in crisis contexts.
- ChatGPT
- YouTube
- AraHopeCorpus
- Arabic
- social media
RESEARCH · arXiv cs.AI English(EN) · 2w · [2 sources]

CrossCult-KIBench: A Benchmark for Cross-Cultural Knowledge Insertion in MLLMs

Two new research papers highlight challenges in developing AI for non-English languages and cultures. One paper reflects on two decades of building Arabic NLP resources, concluding that social and institutional factors are harder to overcome than linguistic ones. The other paper introduces a benchmark for evaluating how well Multimodal Large Language Models (MLLMs) can adapt to different cultures without negatively impacting their performance in other cultural contexts. AI

IMPACT Highlights the need for more culturally aware and linguistically diverse AI models, suggesting current approaches struggle with cross-cultural adaptation.