New GLiNER2-PII model excels at multilingual PII extraction

By PulseAugur Editorial · [1 sources] · 2026-05-11 04:29

Researchers have developed GLiNER2-PII, a compact 0.3 billion parameter model designed for multilingual personally identifiable information (PII) extraction. This model, adapted from GLiNER2, can identify 42 different types of PII at the character-span level. To overcome data scarcity and privacy concerns, a synthetic multilingual corpus was created using a constraint-driven generation pipeline. GLiNER2-PII demonstrated superior performance on the SPY benchmark compared to other systems, including OpenAI's Privacy Filter, and has been released on Hugging Face. AI

IMPACT This new model offers improved multilingual PII detection, potentially enhancing data privacy and security in various applications.

RANK_REASON The cluster describes a new research paper detailing a novel model for PII extraction, including its methodology, performance, and public release. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

AI-generated summary · Google Gemini · from 1 sources. How we write summaries →

New GLiNER2-PII model excels at multilingual PII extraction

COVERAGE [1]

arXiv cs.AI TIER_1 English(EN) · George Hurn-Maloney · 2026-05-11 04:29

GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction

Reliable detection of personally identifiable information (PII) is increasingly important across modern data-processing systems, yet the task remains difficult: PII spans are heterogeneous, locale-dependent, context-sensitive, and often embedded in noisy or semi-structured docume…

COVERAGE [1]

GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction

RELATED ENTITIES

RELATED TOPICS