New GLiNER2-PII model excels at multilingual PII extraction

作者 PulseAugur 编辑部 · [1 个来源] · 2026-05-11 04:29

Researchers have developed GLiNER2-PII, a compact 0.3 billion parameter model designed for multilingual personally identifiable information (PII) extraction. This model, adapted from GLiNER2, can identify 42 different types of PII at the character-span level. To overcome data scarcity and privacy concerns, a synthetic multilingual corpus was created using a constraint-driven generation pipeline. GLiNER2-PII demonstrated superior performance on the SPY benchmark compared to other systems, including OpenAI's Privacy Filter, and has been released on Hugging Face. AI

影响 This new model offers improved multilingual PII detection, potentially enhancing data privacy and security in various applications.

排序理由 The cluster describes a new research paper detailing a novel model for PII extraction, including its methodology, performance, and public release. [lever_c_demoted from research: ic=1 ai=1.0]

在 arXiv cs.AI 阅读 →

AI 生成摘要 · Google Gemini · 来自 1 个来源。我们如何撰写摘要 →

报道来源 [1]

arXiv cs.AI TIER_1 English(EN) · George Hurn-Maloney · 2026-05-11 04:29

GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction

Reliable detection of personally identifiable information (PII) is increasingly important across modern data-processing systems, yet the task remains difficult: PII spans are heterogeneous, locale-dependent, context-sensitive, and often embedded in noisy or semi-structured docume…

报道来源 [1]

GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction

相关实体

相关话题