XLM-RoBERTa
PulseAugur coverage of XLM-RoBERTa — every cluster mentioning XLM-RoBERTa across labs, papers, and developer communities, ranked by signal.
4 天有情绪数据
-
New dataset RoIt-XMASA aids Romanian and Italian sentiment analysis
Researchers have introduced RoIt-XMASA, a new dataset designed for multilingual sentiment analysis in Romanian and Italian. This dataset includes 36,000 labeled reviews across books, movies, and music, along with over 2…
-
New CA-LIG framework enhances Transformer model explainability
Researchers have developed a new framework called Context-Aware Layer-wise Integrated Gradients (CA-LIG) to improve the explainability of Transformer models. This framework offers a unified, hierarchical approach that c…
-
New pipeline creates NLP resource for historical Greek parliamentary text
Researchers have developed a new, reproducible pipeline for creating a Universal Dependencies-style parsing resource for Katharevousa Greek parliamentary text. This workflow addresses the limitations of current NLP tool…
-
New research tackles continual learning in multilingual and multimodal LLMs
Two new research papers explore advancements in continual learning for large language models. The first paper introduces a multi-stage framework for detecting reclaimed slurs in multilingual social media, utilizing XLM-…
-
XLM-RoBERTa model improves hope speech detection in Tulu
Researchers developed an XLM-RoBERTa-based system for detecting hope speech in code-mixed Tulu social media comments. Their organically adapted model showed improved performance over a baseline on a development set. Whi…
-
New benchmark study explores neural network performance on Tajik POS tagging
This paper introduces the first benchmark for part-of-speech tagging in the Tajik language, evaluating various neural network architectures. The study utilized the TajPersParallel corpus, focusing on context-independent…
-
New Sindhi figurative language dataset SiNFluD released with XLM-RoBERTa-XL benchmark
Researchers have developed SiNFluD, a new dataset for classifying figurative language in Sindhi. The dataset was compiled from various online sources and annotated by native speakers, achieving a high inter-annotator ag…
-
Teams leverage LLMs and ensemble methods for multilingual online polarization detection at SemEval-2026
Researchers have developed systems for SemEval-2026 Task 9, a multilingual polarization detection challenge across 22 languages. One approach fine-tuned Gemma 3 models using Low-Rank Adaptation (LoRA) and augmented data…
-
Researchers create Naamah, a large synthetic Sanskrit NER dataset using LLMs
Researchers have developed Naamah, a synthetic dataset of over 100,000 Sanskrit sentences designed to improve Named Entity Recognition (NER) for classical Sanskrit literature. The dataset was generated by combining enti…
-
XITE technique boosts cross-lingual transfer for language models up to 81%
Researchers have introduced XITE, a novel data augmentation technique designed to improve cross-lingual transfer in multilingual language models. This method leverages embedding similarities to identify and adapt labels…