Small language models struggle with PII substitution despite new prompting technique

By PulseAugur Editorial · Summary by gemini-2.5-flash-lite from 1 source

Researchers have developed a novel on-device system for substituting Personally Identifiable Information (PII) with consistent, type-preserving fake values, aiming to preserve downstream utility of text. The system uses a small language model (SLM) for surrogate generation, but initial tests showed the SLM regurgitated demonstration outputs. A new locale-conditioned few-shot prompting technique was introduced to fix this issue, ensuring no echoes and producing locale-correct surrogates. However, the study found that while SLM surrogates create more natural text, they result in a less varied training distribution, which negatively impacts downstream Named Entity Recognition (NER) performance compared to simpler methods. AI

Summary written by gemini-2.5-flash-lite from 1 source. How we write summaries →

IMPACT SLM-based PII substitution may offer naturalness but sacrifices downstream NER performance due to reduced training data variety.

RANK_REASON Academic paper detailing a novel method for PII substitution and its limitations. [lever_c_demoted from research: ic=1 ai=1.0]

Read on arXiv cs.AI →

COVERAGE [1]

arXiv cs.AI TIER_1 · Deepak Kumar · 2026-05-13 13:47

Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models

Personally Identifiable Information (PII) redaction usually replaces detected entities with placeholder tokens such as [PERSON], destroying the downstream utility of the redacted text for retrieval and Named Entity Recognition (NER) training. We propose a fully on-device pipeline…

COVERAGE [1]

Locale-Conditioned Few-Shot Prompting Mitigates Demonstration Regurgitation in On-Device PII Substitution with Small Language Models

RELATED ENTITIES

RELATED TOPICS