PulseAugur
LIVE 10:08:40
tool · [1 source] ·
0
tool

New Sindhi figurative language dataset SiNFluD released with XLM-RoBERTa-XL benchmark

Researchers have developed SiNFluD, a new dataset for classifying figurative language in Sindhi. The dataset was compiled from various online sources and annotated by native speakers, achieving a high inter-annotator agreement. Several models, including mBERT, XLM-RoBERTa, and SetFit, were evaluated, with XLM-RoBERTa-XL demonstrating the best performance. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT Introduces a new benchmark dataset for figurative language classification in Sindhi, enabling further research and model development for low-resource languages.

RANK_REASON This is a research paper introducing a new dataset and evaluating models. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.CL →

COVERAGE [1]

  1. arXiv cs.CL TIER_1 · Wazir Ali, Adeeb Noor, Saifullah Tumrani ·

    Creating and Evaluating Figurative Language Dataset for Sindhi

    arXiv:2605.01323v1 Announce Type: new Abstract: In this article, we introduce SiNFluD, a novel benchmark dataset for Sindhi figurative language classification. We first collect raw text from various blogs, social media platforms, and literary sources, and subsequently prepare the…