New Sindhi figurative language dataset SiNFluD released with XLM-RoBERTa-XL benchmark

By PulseAugur Editorial · [1 sources] · 2026-05-05 04:00

Researchers have developed SiNFluD, a new dataset for classifying figurative language in Sindhi. The dataset was compiled from various online sources and annotated by native speakers, achieving a high inter-annotator agreement. Several models, including mBERT, XLM-RoBERTa, and SetFit, were evaluated, with XLM-RoBERTa-XL demonstrating the best performance. AI

IMPACT Introduces a new benchmark dataset for figurative language classification in Sindhi, enabling further research and model development for low-resource languages.

RANK_REASON This is a research paper introducing a new dataset and evaluating models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

paper
other

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New Sindhi figurative language dataset SiNFluD released with XLM-RoBERTa-XL benchmark

COVERAGE [1]

arXiv cs.CL TIER_1 English(EN) · Wazir Ali, Adeeb Noor, Saifullah Tumrani · 2026-05-05 04:00

Creating and Evaluating Figurative Language Dataset for Sindhi

arXiv:2605.01323v1 Announce Type: new Abstract: In this article, we introduce SiNFluD, a novel benchmark dataset for Sindhi figurative language classification. We first collect raw text from various blogs, social media platforms, and literary sources, and subsequently prepare the…

COVERAGE [1]

Creating and Evaluating Figurative Language Dataset for Sindhi

RELATED ENTITIES

RELATED TOPICS