ENTITY XLM-RoBERTa

XLM-RoBERTa

PulseAugur coverage of XLM-RoBERTa — every cluster mentioning XLM-RoBERTa across labs, papers, and developer communities, ranked by signal.

Total · 30d

18

18 over 90d

Releases · 30d

0

0 over 90d

Papers · 30d

16

16 over 90d

TIER MIX · 90D

TOPICS

RELATIONSHIPS

SENTIMENT · 30D

2 day(s) with sentiment data

RECENT · PAGE 1/1 · 18 TOTAL

RESEARCH · CL_84458 · Jun 10 · 09:59

New datasets and model advance emotional validation in AI dialogue

Researchers have introduced M-EDESConv and M-TESC, new multilingual datasets for emotional validation in dialogue systems, supporting tasks like response identification and timing detection. They also propose MEGUMI, a …
TOOL · CL_65867 · Jun 2 · 04:00

New SindBERT model advances Turkish NLP capabilities

Researchers have developed SindBERT, a new large-scale RoBERTa-based language model specifically for Turkish. Trained on over 300 GB of Turkish text, SindBERT is available in base and large configurations, marking the f…
RESEARCH · CL_56687 · May 28 · 09:08

Perplexity AI open-sources Rust tokenizer, slashing LLM inference latency

Perplexity AI has open-sourced a new Unigram tokenizer implemented in Rust, which significantly reduces latency and CPU utilization in LLM inference. This new tokenizer achieves up to a 5x lower p50 latency compared to …
TOOL · CL_54988 · May 27 · 15:55

Perplexity AI open-sources Unigram tokenizer for 5x speedup

Perplexity AI has open-sourced a new Unigram tokenizer designed to significantly improve CPU performance. This new tokenizer achieves a 5x reduction in latency compared to HuggingFace's implementation and a 2x reduction…
TOOL · CL_51316 · May 26 · 04:00

New dataset boosts Persian social media text classification

Researchers have introduced PerSoMed, a new large-scale dataset designed for classifying Persian social media text. The dataset contains 36,000 posts across nine categories, with each category having 4,000 samples to en…
TOOL · CL_50954 · May 26 · 04:00

AI models struggle with evolving legal language across geopolitical shifts

Researchers investigated temporal concept drift in legal judgment prediction by training transformer models on Ukrainian court decisions from different geopolitical eras. They found that models trained on older data per…
RESEARCH · CL_51285 · May 25 · 16:26

New NLP Models Tackle Dementia Detection in Filipino Speech

Researchers have developed a new approach to dementia detection using natural language processing, focusing on low-resource languages like Filipino. They created a bilingual dataset and evaluated several transformer mod…
TOOL · CL_48879 · May 25 · 04:00

New dataset RoIt-XMASA aids Romanian and Italian sentiment analysis

Researchers have introduced RoIt-XMASA, a new dataset designed for multilingual sentiment analysis in Romanian and Italian. This dataset includes 36,000 labeled reviews across books, movies, and music, along with over 2…
TOOL · CL_51847 · May 24 · 16:50

Team DUTH explores multilingual humour retrieval challenges

Researchers from Team DUTH have explored multilingual humour-aware information retrieval using the CLEF 2025 JOKER Task 1 benchmark, which assesses humour retrieval in English and Portuguese. Their approach integrates m…
TOOL · CL_44765 · May 22 · 04:00

New CA-LIG framework enhances Transformer model explainability

Researchers have developed a new framework called Context-Aware Layer-wise Integrated Gradients (CA-LIG) to improve the explainability of Transformer models. This framework offers a unified, hierarchical approach that c…
RESEARCH · CL_48842 · May 21 · 19:16

New pipeline creates NLP resource for historical Greek parliamentary text

Researchers have developed a new, reproducible pipeline for creating a Universal Dependencies-style parsing resource for Katharevousa Greek parliamentary text. This workflow addresses the limitations of current NLP tool…
RESEARCH · CL_30756 · May 13 · 12:10

New research tackles continual learning in multilingual and multimodal LLMs

Two new research papers explore advancements in continual learning for large language models. The first paper introduces a multi-stage framework for detecting reclaimed slurs in multilingual social media, utilizing XLM-…
TOOL · CL_27576 · May 10 · 22:32

XLM-RoBERTa model improves hope speech detection in Tulu

Researchers developed an XLM-RoBERTa-based system for detecting hope speech in code-mixed Tulu social media comments. Their organically adapted model showed improved performance over a baseline on a development set. Whi…
RESEARCH · CL_20602 · May 6 · 07:26

New benchmark study explores neural network performance on Tajik POS tagging

This paper introduces the first benchmark for part-of-speech tagging in the Tajik language, evaluating various neural network architectures. The study utilized the TajPersParallel corpus, focusing on context-independent…
TOOL · CL_15858 · May 5 · 04:00

New Sindhi figurative language dataset SiNFluD released with XLM-RoBERTa-XL benchmark

Researchers have developed SiNFluD, a new dataset for classifying figurative language in Sindhi. The dataset was compiled from various online sources and annotated by native speakers, achieving a high inter-annotator ag…
RESEARCH · CL_15908 · May 4 · 15:08

Teams leverage LLMs and ensemble methods for multilingual online polarization detection at SemEval-2026

Researchers have developed systems for SemEval-2026 Task 9, a multilingual polarization detection challenge across 22 languages. One approach fine-tuned Gemma 3 models using Low-Rank Adaptation (LoRA) and augmented data…
RESEARCH · CL_09818 · Apr 29 · 09:12

Researchers create Naamah, a large synthetic Sanskrit NER dataset using LLMs

Researchers have developed Naamah, a synthetic dataset of over 100,000 Sanskrit sentences designed to improve Named Entity Recognition (NER) for classical Sanskrit literature. The dataset was generated by combining enti…
RESEARCH · CL_06640 · Apr 28 · 04:00

XITE technique boosts cross-lingual transfer for language models up to 81%

Researchers have introduced XITE, a novel data augmentation technique designed to improve cross-lingual transfer in multilingual language models. This method leverages embedding similarities to identify and adapt labels…