Researchers have introduced PerSoMed, a new large-scale dataset designed for classifying Persian social media text. The dataset contains 36,000 posts across nine categories, with each category having 4,000 samples to ensure balance. The study benchmarks various models, finding that transformer-based architectures, particularly TookaBERT-Large, perform best. This resource aims to advance Persian Natural Language Processing research. AI
IMPACT Provides a foundational resource for advancing Persian NLP tasks like trend analysis and user classification.
RANK_REASON The cluster contains a research paper introducing a new dataset and benchmark results. [lever_c_demoted from research: ic=1 ai=1.0]
AI-generated summary · Google Gemini · from 1 sources. How we write summaries →